Parsing experiments in Python

Recently, I’ve been experimenting with parser generators in Python (we use Python for a lot of our projects at Evince). Previously, I’d hand-written recursive-descent parsers for simple parsing tasks, but I’d never felt comfortable enough using them in production code.

A couple of weeks ago, Essien showed me Jane – a toy language he’d been developing, which compiled to C. After a few minutes exploring his implementation, I tried to quickly code a backend that generated Python code. My goal wasn’t really a fully-functional implementation, just to find out how far I could go. I did get it semi-functional – it choked on nested blocks, because of Python’s indentation.

Anyways, looking at flex and bison got me thinking “surely, there’s got to be stuff like this in Python?” so I did some googling and came across an O’Reilly article on pyparsing. My test project was an expression parser – addition, subtraction, multiplication and division, with support for floating-point numbers and variables. After running into problems trying to use pyparsing with recursive productions, I did some more searching and landed on this page…woot!

Eventually, I selected SPARK (because it looked interesting) and SimpleParse (because it was based on the fast mxTextTools library). I tried out SimpleParse first, and had a parser in minutes – pretty much the time it took me to type out the grammar. It took a bit longer to generate an AST (mostly time spent studying the documentation, and trying several wrong approaches…hey, I’m new at this stuff :), but eventually, I had that done too. Just for good measure, I threw in peak.util.assembler and compiled the AST to Python bytecode, and finally used the “new” module to generate a native Python function. Fun!

The next day, I showed all this to Bayo, and suggested he could use it in the payroll application (part of a larger HR application we’re developing). While discussing his current implementation, he mentioned expressions depending on other expressions, and I thought “that’s easy; functions can call other functions, no problem”. Of course, this meant the parser had to support function calls. I thought “okay, whatever, I’ll have a go at it later”.

At about 2:30 am, I was about to go to bed, and thought I’d just take a look, and see how much work it’d be adding support for functions. 3 minutes later, I was done with the implementation – I added a new production for (no-argument) functions, added a new AST node and modified my compiler to emit the appropriate bytecode for function calls (the “dis” module came in pretty handy here). Needless to say, I’m hooked.

Since then, I’ve tried to implement a vCard parser (the formal grammar specification for vCard 3.0 is here), port Jane’s compiler to Python and build a simple template language. I’ve run into, what I’d consider limitations of SimpleParse: its inability to specify ignored characters, and it’s lack of support for back-tracking. Possibly, though, these are as a result of my own inexperience, but I’d like to try out a few other parser-generators first. Next up: PLY.


2 thoughts on “Parsing experiments in Python

  1. This is great, it means I can now check out a few more parsers like the ones mentioned here. I’ve used quite a few in the past, but haven’t found the right one that does everything.

    I will give SPARK and SimpleParse a go later.

    Thanks for sharing these tip :)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s