Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Test drive parsers

My experience in using jparsec, is that test driven development really helps. It is unfortunately hard to debug a complex parser object, but it is much easier for a small vanilla one. If the whole parser program is built with test cases that tests every tiny piece of logic, detecting and debugging an error becomes an easy task.

Start from writing unit test case for every production rule before implementing it. Run the test case, see the red bar, and then implement the production rule to make turn it to green.

Sometimes when grammar is recursive (In SQL syntax, for example, query includes expression and predicate, predicate includes expression and relation, and finally expression includes predicate), it may seem difficult to break down the grammar into smaller independent pieces that's unit-testable, because of the inter-dependencies.

What we can do is, to create separate methods for relation, expression and predicate parsers. For each one that requires a dependency on others, pass the dependent parsers as parameters to the methods as following:

Code Block
class Expressions {
  static final Parser<Expression> NUMBER_EXPRESSION = ...; // unit testable
  static final Parser<Expression> IDENTIFIER_EXPRESSION = ...; // unit testable
  ... // other atomic expression parsers
  
  static Parser<Expression> expressionParser(Parser<Predicate> predicateParser){
    ...
    //to create expression parser.
  }
}

class Prediates {
  static final Parser<Expression> BOOL_LITERAL = ...; // unit testable
  ... // other atomic predicate parsers
  
  static Parser<Predicate> predicateParser(Parser<Expression> exprParser, Parser<Relation> relParser){
    ...
    //to create predicate parser.
  }
}

class Relations {
  static final Parser<Relation> TABLE = ...; // unit testable
  ... // other atomic relation parser
  
  Parser<Relation> relationParser(Parser<Expression> exprParser, Parser<Predicate> predParser){
    ...
    //to create relation parser.
  }
}

These methods (and their sub functions should they be further refactored) can then be unit-tested independently by passing them dummy dependencies. For example:

Code Block
Parser<Expression> parser = expressionParser(Predicates.BOOL_LITERAL);
assertEquals(..., parser.parse(...));

These functions, eventually will be assemblied together using lazy parsers as in:

Code Block
Parser<Relation> getRelationParser(){
  Parser.Reference<Predicate> predidateRef = Parser.newReference();
  Parser.Reference<Relation> relationRef = Parser.newReference();
  Parser<Expression> expr = expressionParser(predicateRef.lazy());
  Parser<Predicate> pred = predicateParser(expr, relationRef.lazy());
  Parser<Relation> rel = relationParser(expr, pred);
  predicateRef.set(pred);
  relationRef.set(rel);
  return rel;
}

Gotchas

The number 1 gotcha in jparsec is ambiguity. The following code will fail:

Code Block
Scanners.INTEGER.or(Scanners.DECIMAL).sepBy(Scanners.isChar(',')).parse("1,2.5");

Looking from the grammar, it seems simple and valid. The first element of the list is an integer, recognized by Scanners.INTEGER; and the second element is a decimal, recognized by Scanners.DECIMAL.

The reason this fails is because Parsers.or() runs alternative parsers from left to right. When "2.5" is encountered, Scanners.INTEGER is tried first, which succeeds by matching "2". Since the first alternative succeeds, the rest won't get a chance to run, leaving the ".5" part unparsed.

The fix is easy: just flip the order of the two alternatives:

Code Block
Scanners.DECIMAL.or(Scanners.INTEGER).sepBy(Scanners.isChar(',')).parse("1,2.5");

Another fix is to use the longer() combinator. It will force Scanners.DECIMAL to be run, even if the first alternative succeeds. The down side though, is that longer() runs slower.


Created by benyu benyu
On Tue Oct 24 23:13:21 CDT 2006
Using TimTam