Archive

Archive for July, 2008

Why Gazelle Matters, part 2

July 9th, 2008 Josh 2 comments

Every day when I read the programming reddit, I see things that reaffirm to me why Gazelle matters.

Yesterday it was an “ask reddit”: Need a C library for parsing C files, suggestions?. Responses include:

  • “How about gcc?”, clearly not realizing that gcc is (1) not a library and (2) ridiculously complicated.
  • “Here are an ANSI C grammar for lex and yacc. [...] (NB: These were last updated in 1995, so I don’t know if you’ll need to tweak them any. But, at least they’ll get you close [...])”. I am constantly surprised that people do not realize how utterly useless it is to have software of unknown quality. Can you imagine asking for an implementation of SHA1, and having someone hand you some code and saying “I’m not sure if it’s quite right, but it should at least get you part of the way there.” You don’t know where the possible problems are, you don’t know anything about its design process — you might as well start from scratch, that’s how useless it is to have half-written code.
  • “Language.C: manipulating and generating C abstract syntax from Haskell”. So here you have someone who’s doing the hard work of parsing C and making sure it’s correct, but he’s writing his library in Haskell so it only works for Haskell. Sadly useless to 99.9% of the programming world.

Someone also did mention Elsa, which is probably the best solution to the original guy’s problem, but then someone else replied:

elsa looks really good. I need code detail that the preprocessor doesn’t know about, but elsa can probably get me there. Any interest in a python wrapper?

There you go again — even when someone has done a good job (like Elsa has), it’s still useless to people parsing from other languages unless you write specific “bindings” for each language you want to parse from. Madness, pure madness! The idea that it should take N^2 work to parse N languages from N languages is madness.

The two most important design goals of Gazelle are:

  • reusable grammars: grammars that can be used by anyone without modification. Grammars that can have test suites, to ensure quality and give people confidence that they are correct.
  • grammars that you can use from any language, without bindings. Sure, you have to write bindings for Gazelle itself, but that only has to happen once, not once per language you want to parse. So parsing N languages from N languages takes N+N effort, not N^2.
Categories: Gazelle Tags:

Protocol Buffers

July 7th, 2008 Josh No comments

Today Google open-sourced a component we’ve used in-house for a long time called Protocol Buffers. It’s a binary format that we use for almost all of our on-the-wire messages and lots of disk-based long-term storage as well. For many (maybe most, though not all) uses, Protocol Buffers kick XML’s ass. In a big way. Seriously, if you’re XML this is the part where you sulk home with your tail between your legs.

Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

Protobuf Developer Guide

If you or your company needs a very compact, very fast, extensible format for structured data, you should give Protobufs a good look!

(P.S. Of course I don’t speak for Google. The attitude is all me talking and my personal disdain for XML. Google’s official attitude is, of course, much more diplomatic).

Categories: Uncategorized Tags: