Posted by josh at July 9th, 2008

Every day when I read the programming reddit, I see things that reaffirm to me why Gazelle matters.

Yesterday it was an “ask reddit”: Need a C library for parsing C files, suggestions?. Responses include:

  • “How about gcc?”, clearly not realizing that gcc is (1) not a library and (2) ridiculously complicated.
  • “Here are an ANSI C grammar for lex and yacc. [...] (NB: These were last updated in 1995, so I don’t know if you’ll need to tweak them any. But, at least they’ll get you close [...])”. I am constantly surprised that people do not realize how utterly useless it is to have software of unknown quality. Can you imagine asking for an implementation of SHA1, and having someone hand you some code and saying “I’m not sure if it’s quite right, but it should at least get you part of the way there.” You don’t know where the possible problems are, you don’t know anything about its design process — you might as well start from scratch, that’s how useless it is to have half-written code.
  • “Language.C: manipulating and generating C abstract syntax from Haskell”. So here you have someone who’s doing the hard work of parsing C and making sure it’s correct, but he’s writing his library in Haskell so it only works for Haskell. Sadly useless to 99.9% of the programming world.

Someone also did mention Elsa, which is probably the best solution to the original guy’s problem, but then someone else replied:

elsa looks really good. I need code detail that the preprocessor doesn’t know about, but elsa can probably get me there. Any interest in a python wrapper?

There you go again — even when someone has done a good job (like Elsa has), it’s still useless to people parsing from other languages unless you write specific “bindings” for each language you want to parse from. Madness, pure madness! The idea that it should take N^2 work to parse N languages from N languages is madness.

The two most important design goals of Gazelle are:

  • reusable grammars: grammars that can be used by anyone without modification. Grammars that can have test suites, to ensure quality and give people confidence that they are correct.
  • grammars that you can use from any language, without bindings. Sure, you have to write bindings for Gazelle itself, but that only has to happen once, not once per language you want to parse. So parsing N languages from N languages takes N+N effort, not N^2.