Skip to end of metadata
Go to start of metadata

Test drive parsers

My experience in using jparsec, is that test driven development really helps. It is unfortunately hard to debug a complex parser object, but it is much easier for a small vanilla one. If the whole parser program is built with test cases that tests every tiny piece of logic, detecting and debugging an error becomes an easy task.

Start from writing unit test case for every production rule before implementing it. Run the test case, see the red bar, and then implement the production rule to make turn it to green.

Sometimes when grammar is recursive (In SQL syntax, for example, query includes expression and predicate, predicate includes expression and relation, and finally expression includes predicate), it may seem difficult to break down the grammar into smaller independent pieces that's unit-testable, because of the inter-dependencies.

What we can do is, to create separate methods for relation, expression and predicate parsers. For each one that requires a dependency on others, pass the dependent parsers as parameters to the methods as following:

These methods (and their sub functions should they be further refactored) can then be unit-tested independently by passing them dummy dependencies. For example:

These functions, eventually will be assemblied together using lazy parsers as in:

Gotchas

The number 1 gotcha in jparsec is ambiguity. The following code will fail:

Looking from the grammar, it seems simple and valid. The first element of the list is an integer, recognized by Scanners.INTEGER; and the second element is a decimal, recognized by Scanners.DECIMAL.

The reason this fails is because Parsers.or() runs alternative parsers from left to right. When "2.5" is encountered, Scanners.INTEGER is tried first, which succeeds by matching "2". Since the first alternative succeeds, the rest won't get a chance to run, leaving the ".5" part unparsed.

The fix is easy: just flip the order of the two alternatives:

Another fix is to use the longer() combinator. It will force Scanners.DECIMAL to be run, even if the first alternative succeeds. The down side though, is that longer() runs slower.


Created by benyu benyu
On Tue Oct 24 23:13:21 CDT 2006
Using TimTam

  • No labels