# Parser Combinators

The idea that underlies parser combinators is that all tasks that involve

reading complex input can ultimately be reduced to *one* primitive: read

a character from the input, and figure out what to do with it.

Of course, to figure out what to do with it, you might need to look at

more than one character; you might need to look at a bunch, to choose between

options.

And, of course, you don't really want to think of it in terms of reading

characters. In fact, the way that you'd generally want to think it is

as a formatted object which was serialized in some way.

So - how do geeks talk about what a serialized format looks like? Well,

typically, we do one of two things: we write a regular expression, or

we write a grammar.

Here's where parser combinators start to make sense. A few years ago, some

really bright functional language geeks realized that if you say that

a parser is a function - then you can write a set of meta-functions that

compose those parsers - and in fact, you can write those meta-functions so

that you end up with something that looks darned similar to the EBNF grammar.

For some examples, I'm going to follow the lead of the guys who originally

came up with the idea, and pretend that we can create new operator

symbols if we want. I'm going to use triple-symbols for my new operators:

So here's the basic list of combinators:

**Sequencing**

This returns a new parser for a followed by b.

**Alternation**

This returns a new parser for an a or a b.

**Optional**

this returns a new parser which parses an a if it can, and succeeds no matter what. (That is, an optional a in the

grammar.)

**Lists**

These return parsers for repetitions, either requiring

at least one or not.

So, take a simple grammar, like one for Lisp lists:

**Basic Lisp Grammar**

**Combinator Parser for Lisp Code**

Of course, we don't have any definitions of lparen, rparen, number, or

symbol. That's because we haven't talked about how we get down to the

low level primitives. It's pretty simple really: there's exactly two

primitives: one takes a set of characters, and returns a parser that

succeeds if the current input character contains a character in the set;

the other primitive just reads one character and succeeds. We'll call the

first one "chars", and the second one "any". So now, we can fill in a bit:

**BNF for tokens in Lisp grammar**

In parser combinators, this becomes:

**Parser Combinators for Tokens**

That's almost the whole show. There's one little bit left - which is attaching

actions to parsing elements. That is something that you'd typically want to

do. There have been a bunch of ways that people have come up with for handling

actions. I'll just toss one in, for the sake of simplicity; they all have

roughly the same general idea.

(Actually, I'm out of time for the evening, so I'll save this for tomorrow.)