Similar to any higher order combinator framework, rparsec program is not real easy to debug. When error does happen, especially when the production rule is complex, tracing down the error may take considerable amount of time. Though I do not think debugging rparsec is any harder than debugging parser generators, this is an undebatable problem that we should face.
My experience in using rparsec (and jparsec too), is that test driven development really helps. It is unfortunately hard to debug a complex parser object, but it is much easier for a small vanilla one. If the whole parser program is built with test cases that tests every tiny piece of logic, detecting and debugging an error becomes a much easier task.
Start from writing unit test case for every production rule before implementing it. Run the test case, see the red bar, and then implement the production rule to turn it to green.
That said, when you do encounter errors, here're a few things to check up.
Parsers#sum and Parser#plus are subject to the default look-ahead value. Suppose if any of the alternative parser fails with some input consumption, the rest of the alternatives will not get the chance to run. The solution is to use the "|" operator or Parsers#alt. They will try all the alternatives until a success regardless of input consumption. Parser#atomize or Parser#lookahead are also tools to get around this problem.
Suppose we have both integer and number that parses integral literal and decimal literal. "(integer|number) << eof" will fail given the input of "1.0" because "integer|number" will first try integer, which successfully recognizes the input "1". And since the first alternative succeeded, the number parser will not get a chance to run. Later when we try to run eof against the remaining ".0", we get an error.
Solution would be either place "number" as the first alterantive, or to use "longer(integer, number)" which will try both alternative and prefer the one with longer match.
Make sure the token identifier you use in the syntactic parser matches the one used in lexecial analysis. If you use different identifier for number token and integer token, don't expect token(:number) will match :integer tokens. Regular words are by default identified by :word.
- Right input
When your syntactic parser complains something like "'c' encountered", where 'c' is the first character of the current word ("copy" for example), it is quite possible that you are feeding it string as input, not the desired token array.
Utilities that may help in troubleshooting:
- Parsers#watch can be used to print trace message as well as monitor the current input.
- Parsers#get_index is a parser that reads the index number of the current input, with which you can know where the you are.
Created by benyu benyu
On Mon Oct 23 23:25:06 CDT 2006