Protocol Buffers

Posted by josh at July 7th, 2008

Today Google open-sourced a component we’ve used in-house for a long time called Protocol Buffers. It’s a binary format that we use for almost all of our on-the-wire messages and lots of disk-based long-term storage as well. For many (maybe most, though not all) uses, Protocol Buffers kick XML’s ass. In a big way. Seriously, if you’re XML this is the part where you sulk home with your tail between your legs.

Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

Protobuf Developer Guide

If you or your company needs a very compact, very fast, extensible format for structured data, you should give Protobufs a good look!

(P.S. Of course I don’t speak for Google. The attitude is all me talking and my personal disdain for XML. Google’s official attitude is, of course, much more diplomatic).

Posted in Uncategorized| No Comments | 

Gazelle v0.2 is here!

Posted by josh at June 29th, 2008

It’s been a long time coming, but Gazelle v0.2 is finally here!

To me, Gazelle 0.2 represents a significant shift. With 0.2, Gazelle is finally in a place where I think it’s ready for people to tinker with. In 0.2 there is enough documentation to figure out what’s going on, and the command-line programs like gzlparse have reasonable –help messages and can do useful things. Starting with Gazelle 0.2, your problems are my problems: things that don’t work right should either be fixed or be written down as TODO items.

Gazelle 0.1 to 0.2 was a major overhaul — implementing LL(k) lookahead took major surgery. Half the time the code didn’t actually work, because major rewrites were only partially done. I expect all that to change with Gazelle 0.2 — I want future releases to be far more incremental, and for every commit to leave the repository in a working state. I want a 0.2.1 and 0.2.2 that fix a lot of the edge cases that still aren’t right in 0.2 (you can read more about these shortcomings in the “Tour” section of the manual or in the TODO).

There are still many major features to add to Gazelle in the future — you can see a list in the TODO. But again, I think these can be added without breaking the tree in the meantime.

There’s one major bummer about 0.2. I had to completely remove the “@ignore” feature, which was Gazelle’s answer to letting you ignore whitespace/comments without having a separate lexer. I removed it because I realized that the abstraction I had invented for expressing this concept was not quite right, and that a more general-purpose abstraction was the right answer — the abstraction I have in mind will also handle things like languages embedded in other languages (like Ruby inside HTML: RHTML). But the bummer is that for the moment, Gazelle has no answer for how to ignore whitespace/comments. So it’s clearly not useful for real work yet.

So try it out and send any feedback you have to gazelle-users. Thanks!

Posted in Gazelle| No Comments | 

Defending RPC

Posted by josh at May 23rd, 2008

Steve Vinoski has come out very vocally against RPC in the last few days: see this blog entry and this mailing list post. The blog entry (which I read first) made him sound like someone who just hasn’t been around large systems much, but then I was surprised to see that he’s a senior fellow or architect or something along those lines at a company that does distributed systems.

His blog entry basically makes fun of Cisco for inventing/releasing another RPC system. It’s not clear exactly what he thinks they should have done instead. What is strange about this criticism is that tons of technology companies have developed their own RPC system — Facebook and Cisco publicly, and other technology companies I am familiar with in a not-so-public way. Guess what: large commercial distributed systems are built largely on RPC. Is he arguing that all of the engineers at these companies simultaneously got the bad idea of investing in something they don’t need? If RPC is such a bad idea, then why is everybody doing it?

“Everybody’s doing it” obviously isn’t a justification alone, but it definitely puts the onus on the person making the critique to show why it’s a bad idea. I got a better idea where he was coming from when I read the mailing list post. Here’s the heart of his argument:

the fundamental problem is that RPC tries to make a distributed invocation look like a local one.This can’t work because the failure modes in distributed systems are quite different from those in local systems, so you find yourself having to introduce more and more infrastructure that tries to hide all the hard details and problems that lurk beneath. That’s how we got Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and SOAP and JAX-RPC, to name a few off the top of my head, each better than what came before in some ways but worse in other ways, especially footprint and complexity. But it’s all for naught because no amount of infrastructure can ever hide those problems of distribution. Network partitions are real, timeouts are real, remote host and service crashes are real, the need for piecemeal system upgrade and handling version differences between systems is real, etc. The distributed systems programmer *must* deal with these and other issues because they affect different applications very differently; no amount of hiding or abstraction can make these problems disappear.

Finally something we can agree on! Yes, on a network shit happens, and no sane RPC system will try to hide this from you.

But then again, I don’t know of any RPC system that tries to hide this from you except possibly CORBA. Maybe there’s a horrible history here I don’t know about, but no RPC system I have ever encountered tries to hide from you the fact that on a network, shit happens.

So what are his other criticisms?

RPC systems in C++, Java, etc. also tend to introduce higher degrees of coupling than one would like in a distributed system. Typically you have some sort of IDL that’s used to generate stubs/proxies/skeletons — code that turns the local calls into remote ones, which nobody wants to write or maintain by hand. The IDL is often simple, but the generated code is usually not. That code is normally compiled into each app in the system. Change the IDL and you have to regenerate the code, recompile it, and then retest and redeploy your apps, and you typically have to do that atomically, either all apps or none, because versioning is not accounted for.

Yay, we can agree again. RPC systems that make you do an “all at once” upgrade are a bad idea. But again, no RPC system I have encountered makes you do this. Does this mean that the RPC system guarantees for you that the old and new protocols are compatible? Of course not — you don’t want your framework to be some big “I know what’s best for you” mommy that does really expensive things to solve this problem, like loading both versions of your code at the same time. But any RPC framework worth its salt makes it possible to have different interface versions interoperate. Adding a new parameter? No problem, old servers simply won’t see it. Completely changing the semantics of your call? No problem — just give the new call a new name.

Steve’s criticism amounts to “sucky RPC systems suck.” Yes Steve, yes they do. But a lot of the technology world is running on non-sucky RPC systems, and from time to time you get a glimpse of that when a company like Facebook or Cisco releases their internal RPC system to the outside world. Did Steve check to see if Cisco’s new RPC system is subject to any of his critiques? I haven’t, but I would suspect it isn’t.

Posted in Uncategorized| 10 Comments | 

Gazelle Grammar Visualization

Posted by josh at April 10th, 2008

I’ve been quiet about Gazelle news lately, but since I wrote last I’ve hit 3 of my 6 goals for Gazelle 0.2, and one that I hadn’t thought to include. To review those goals and see which ones I’ve completed:

  • complete Strong-LL(k) lookahead support. (it’s not 100% complete yet, but it’s definitely solid enough for a 0.2 release)
  • a command-line compiler program (gzlc) that takes reasonable options and is simple enough to use by reading its –help
  • a “tour” section for the manual
  • a command-line program (gzlparse) that can output the parse tree in a useful format, so you can see how Gazelle parses your input text.
  • a test suite, so that when people report bugs I can add the bugs to the test suite and not regress.
  • (stretch): make Gazelle self-hosting, so that the parser is more robust and easier to understand than the hand-written recursive descent parser I’m currently using. I don’t want people to have to deal with corner-case parser bugs.
  • a way to visualize grammars, to spot-check them against your expectations

It’s the grammar visualization that I forgot to include. I mentioned parse tree visualization a few blog posts ago, but this is different — one is visualizing how a bunch of text got parsed, the other is visualizing the grammar itself.

It still has room for improvement, but here is what my grammar visualization currently looks like for JSON. You can see an NFA for each one of your rules, a DFA for each state of lookahead, and the DFAs that do the lexing.

The latest code from Git (note that I recently moved from repo.or.cz to Github) can generate these grammar dumps — just pass ‘-d’ to gzlc.

Posted in Gazelle| 1 Comment | 

The future of automatic memory management

Posted by josh at April 9th, 2008

Observation #1: stop-the-world garbage collection is a thorn in the side of latency-sensitive applications.

Observation #2: we will very soon have more cores than we know what to do with.

Prediction: fully concurrent garbage collection is the future of automatic memory management. I’m talking garbage collectors that run in other threads and clean up after me without ever stopping me in the middle of what I’m doing.

It will almost certainly be more expensive in terms of total CPU time, and probably can’t be as aggressive in terms of what it can reclaim at any point in time, but for most applications the latency guarantees will far outweigh.

Discuss.

Posted in Uncategorized| 4 Comments | 

« Previous Postings | Next Postings »