Posted by josh at July 12th, 2008

There’s lots of misinformation flying around the blogosphere about Google’s Protocol Buffers. One common claim is that you can’t parse a protobuf without having the .proto file. This is false, as demonstrated by This 100-line C program that does just that. It can parse an arbitrary protobuf into its field numbers and wire types. This is pretty closely equivalent to what you get from a generic XML parser, except that with XML you get names for the keys (elements) instead of numbers and strings instead of the four or so wire types that are defined by protocol buffers.

In both the XML and the Protocol Buffer case, you want to have more information if you’re going to actually write programs that consume application-domain data. You want documentation that specifies exactly what all the fields mean, and enough information to turn the on-the-wire values into actual numbers where appropriate. It just so happens that Protocol Buffers specify this information in a structured format called a .proto file.

Update, 12:50PM July 12: apparently I wasn’t clear enough: my 100 lines of C does not use any Protocol Buffer library. It implements the decoding itself. My point is that the format is so simple that you can parse it generically in 100 lines of C. (If you’re wondering: you’d be lucky to get within a factor of 10 for a bare-bones XML parser).