Posted by josh at January 7th, 2008

I’m not sure if I’ve managed to convince very many people that my parsing framework Gazelle is a big deal. Most people don’t think of parsing as something they do a lot of. A few quick and dirty regular expressions here and there is what most programmers get by on. “Oh, I need to validate an email address? Hrm, that’s something like /[\w\.\-]*@[\w\.]*/ right? Oh wait, the username can have escaped @-signs in it? Ah hell, my regex is close enough.

But forget about all that for a moment. Let’s forget about the crazy productivity gains you could get from just being able to pull a standard email address parsing module that works from any language. Let’s forget about all that and just talk about performance.

I want to talk about Mongrel for a second. Mongrel has been the star of the Ruby Webserver world for a long time because it vastly outperforms the painfully slow pure-Ruby server “Webrick” that ships with Ruby. Mongrel has become something of a soap opera; I won’t go into it, but while that escapade spiraled into a wild frenzy of name calling, dramatic exits, and infighting, another little webserver called Thin came up out of nowhere and poised itself to do Mongrel better than Mongrel.

I haven’t tried Thin myself or verified any of their claims, but they claim to be even faster than Mongrel. How? Well according to them, it is by using “the Mongrel parser, the root of Mongrel speed and security.”

Did you catch that?

Amidst all this drama, time, and effort, the core of Mongrel technology is a fast parser. Mongrel’s technical edge boils down to this, which is a description of the HTTP language written with the regular language parsing tool Ragel.

Imagine if parsers this fast and this powerful (more powerful actually, since Ragel can’t handle context-free grammars) were available directly from Ruby. You would be able to get the performance of Mongrel or Thin without writing any C code at all. Using the Ragel parser, on the other hand, requires writing a custom extension for Ruby, because Ragel generates plain C code.

What’s better, you wouldn’t have to write the grammar file yourself, because chances are you won’t be the first person who wants to parse HTTP.

That is why Gazelle matters. Am I making any sense?