Skip to end of metadata
Go to start of metadata

The Problem

In C++, nested template had (does it still?) a special requirement that you cannot write "Foo<Bar<int>>", because ">>" is the reserved right shift operator, not two adjacent ">" operators. Parsing it properly would make the already complicated grammar even more so.

Java however, doesn't have this limitation. It is still the job of the parser to properly disambiguate ">>" in expression from ">>" in nested generic types.

One possible solution is to muck around with the grammar so that ">>" is used in both generics and expression. The cost is that multiple not-so-intuitive production rules need to be introduced to work around the ambiguity.

Solution in jparsec

Jparsec provides a simpler solution (check out the Java parser sample in the source code).

In lexical analysis phase, "<" and ">" characters are uniformly tokenized individually. There will be no token for "<<", "<<<", ">>", or ">>>". Thus in parsing generic types, the parser will not be confused by any ">>" tokens.

For example:

Now that makes it real simple for parsing nested generic types with the pointy brackets. But what about the "<<" and ">>" operators used in expression?

What happens is that lexical analysis phase has no idea of what context the current token is in, but in syntactical analysis we know whether we are parsing a generic type or an expression. In the latter case, we will treat three adjacent ">" tokens as one single ">>>" operator, and two adjacent ">" tokens as one single ">>" operator.

By adjacent, I mean that they have to be next to each other in the original source, if the first ">" character appears at line 7 column 6, the next one has to be at line 7, column 7. In other words, their physical indexes in the original source are adjacent.

Luckily, the Token class carries the physical index in the original source. By using the next() combinator, we can specially handle the "adjacency":

The above code checks that the list of tokens returned by a parser are adjacent.

And then we can use the list() combinator to turn a the special operator string to the parser that returns token list:

And by calling adjacent(">>>"), we get a parser that parses three adjacent ">" tokens. Everything else in the grammar can stay as simple as they should be.

One catch is that we need to make sure ">>" is not a prefix of ">>>" so to get the parser for ">>" operator, we will need to do a little bit of tweaking as:

And that's about it. The same code can be applied to "<<" and "<<<".

  • No labels