Defending RPC
Posted by josh at May 23rd, 2008
Steve Vinoski has come out very vocally against RPC in the last few days: see this blog entry and this mailing list post. The blog entry (which I read first) made him sound like someone who just hasn’t been around large systems much, but then I was surprised to see that he’s a senior fellow or architect or something along those lines at a company that does distributed systems.
His blog entry basically makes fun of Cisco for inventing/releasing another RPC system. It’s not clear exactly what he thinks they should have done instead. What is strange about this criticism is that tons of technology companies have developed their own RPC system — Facebook and Cisco publicly, and other technology companies I am familiar with in a not-so-public way. Guess what: large commercial distributed systems are built largely on RPC. Is he arguing that all of the engineers at these companies simultaneously got the bad idea of investing in something they don’t need? If RPC is such a bad idea, then why is everybody doing it?
“Everybody’s doing it” obviously isn’t a justification alone, but it definitely puts the onus on the person making the critique to show why it’s a bad idea. I got a better idea where he was coming from when I read the mailing list post. Here’s the heart of his argument:
the fundamental problem is that RPC tries to make a distributed invocation look like a local one.This can’t work because the failure modes in distributed systems are quite different from those in local systems, so you find yourself having to introduce more and more infrastructure that tries to hide all the hard details and problems that lurk beneath. That’s how we got Apollo NCS and Sun RPC and DCE and CORBA and DSOM and DCOM and EJB and SOAP and JAX-RPC, to name a few off the top of my head, each better than what came before in some ways but worse in other ways, especially footprint and complexity. But it’s all for naught because no amount of infrastructure can ever hide those problems of distribution. Network partitions are real, timeouts are real, remote host and service crashes are real, the need for piecemeal system upgrade and handling version differences between systems is real, etc. The distributed systems programmer *must* deal with these and other issues because they affect different applications very differently; no amount of hiding or abstraction can make these problems disappear.
Finally something we can agree on! Yes, on a network shit happens, and no sane RPC system will try to hide this from you.
But then again, I don’t know of any RPC system that tries to hide this from you except possibly CORBA. Maybe there’s a horrible history here I don’t know about, but no RPC system I have ever encountered tries to hide from you the fact that on a network, shit happens.
So what are his other criticisms?
RPC systems in C++, Java, etc. also tend to introduce higher degrees of coupling than one would like in a distributed system. Typically you have some sort of IDL that’s used to generate stubs/proxies/skeletons — code that turns the local calls into remote ones, which nobody wants to write or maintain by hand. The IDL is often simple, but the generated code is usually not. That code is normally compiled into each app in the system. Change the IDL and you have to regenerate the code, recompile it, and then retest and redeploy your apps, and you typically have to do that atomically, either all apps or none, because versioning is not accounted for.
Yay, we can agree again. RPC systems that make you do an “all at once” upgrade are a bad idea. But again, no RPC system I have encountered makes you do this. Does this mean that the RPC system guarantees for you that the old and new protocols are compatible? Of course not — you don’t want your framework to be some big “I know what’s best for you” mommy that does really expensive things to solve this problem, like loading both versions of your code at the same time. But any RPC framework worth its salt makes it possible to have different interface versions interoperate. Adding a new parameter? No problem, old servers simply won’t see it. Completely changing the semantics of your call? No problem — just give the new call a new name.
Steve’s criticism amounts to “sucky RPC systems suck.” Yes Steve, yes they do. But a lot of the technology world is running on non-sucky RPC systems, and from time to time you get a glimpse of that when a company like Facebook or Cisco releases their internal RPC system to the outside world. Did Steve check to see if Cisco’s new RPC system is subject to any of his critiques? I haven’t, but I would suspect it isn’t.
Josh,
Sounds like you haven’t been in the industry that long. Steve Vinoski has been around many large distributed systems and was a major contributor to the CORBA specs. Looks like you haven’t done much programming with CORBA, SOAP, EJB etc. in large systems to see why RPC is flawed compared to messaging. If you noticed none of the above distributed object mechanisms are popular anymore. Why? its precisely for the reasons you have quoted from Steves email. Have you even noticed that SOAP is not an acronym anymore? (It used to be simple object access protocol). Most web services are not doing RPC with SOAP anymore, instead they are using it for messaging.
Looking at your resume you seem like a bright kid. But remember that smart people learn from other peoples mistakes. If you are too naive to commit the blunders the rest of the industry learned the hard way then it is your loss. Also before you attack somebody like Steve you need to do your homework, or else it will easily expose how inexperianced you are. 5 years from now you will realize this.
Rajith Attapattu
I’m personally in the camp that says there are better architectures than RPC for most of the use cases where I’ve seen RPC deployed. And from my experience, I’m almost as cynical as Vinoski is about any new proprietary/non-standard middleware stack.
Of course, it doesn’t make sense to bash RPC unless you’re comparing it a viable alternative, which I think is a major failure in the blog and email posts you linked to. (Though as Vinoski points out, he’s written plenty elsewhere about his preferred alternatives. I’ve been reading his stuff for a long time, and it’s generally very good.)
RPC is a client-server model where servers provide arbitrary function-call APIs to remote clients. Two alternative approaches that Vinoski writes about are REST, a client-server model where servers publish addressable resources implementing a uniform interface, and Erlang/OTP, which models distributed systems primarily in terms of process trees and message queues.
Let’s consider one of the RPC systems you and I have worked with. Yes, it got the job done. But it used a proprietary wire format that had to be reverse engineered by any client that didn’t link to the (closed-source) client library. It tunneled over HTTP, but used its own metadata format, meaning that the metadata in the HTTP headers was both incomplete and incorrect and so it could not take advantage of standard HTTP proxies or libraries. It used an IDL that had several of the problems Vinoski cites in your links, such as mismatches between IDL datatypes and client-side language datatypes. It had poorly-reinvented solutions for problems like caching and character encoding that were already solved by the standard protocols it tunneled through. The proprietary nature of the protocol, the non-uniformity of interfaces and representations, and the use of code generation, all meant that it took great effort to interoperate with clients or servers written in languages without provided libraries. All together, this meant that applications were tightly coupled to the middleware layer itself, and services were more coupled than necessary to their clients.
Meanwhile, I believe that 80% or more of the applications implemented in that system could have been done better in REST fashion using plain old HTTP with standard representation formats like Atom or simple JSON structures, and that the advantages of that approach would outweigh the disadvantages.
Sure, the best RPC systems usually have great optimizations for things deserialization and bandwidth. But even at Google or Amazon, the number of applications where a 20% increase in parsing speed is going to make a noticeable difference in performance or cost is fairly small. In a business where the entire infrastructure runs on 4 or 5 servers, the number is probably zero. When you’re talking to databases and web browsers and doing computations and so on, the middleware layer is simply not the bottleneck.
On the other hand, many middleware systems do implement genuinely useful features that you don’t get for free from REST or HTTP. I just don’t think those features tend to justify the costs, for most applications.
(I’d love to talk about this in person some time, since I could draw on more specific examples from code we’ve worked on together.)
Matt Brubeck
@Rajith: I may have only been at this for four years professionally, but in that time I’ve been deep into the guts of some of Amazon and Google’s most scalable systems. I’m not making stuff up based on theories, I’m observing what the brilliant people around me actually do when they are engineering scalable software.
Since nothing you write is a substantive critique of my blog entry, there’s not really anything more to say. BTW, I’m definitely not defending SOAP, CORBA, or EJB: if the existing open-source RPC systems were any good, companies wouldn’t be reinventing this stuff all over again. XML-RPC is the only one of the open-source systems that I can stomach.
@Matt: I was pretty open-minded about REST for a while, but I’ve come to believe that it’s a lot of hot air. Standard data formats are better than proprietary ones obviously, but I just don’t buy the “tunnels, proxies, and caches” argument. Nor do I buy that complex systems are naturally modeled as resources. But yes, we should talk more about this another time!
josh
@Rajith and @Matt: to add to my last comment, consider the information Google has published about BigTable and Chubby, major systems on which much of Google is built. The APIs to this systems are based on RPC. Could they be shoehorned into REST? Of course, as XML has so painfully taught us, anything can be shoehorned into anything. But is it a good fit? The caching and proxying surrounding these systems is far too system-specific for any standard HTTP functionality to get right.
If what you’ve got sitting around is a bunch of libraries that do HTTP, I can see why REST looks appealing. But when you’ve got a mind-blowingly good RPC system sitting around, it seems pretty obviously the best answer to your request/reply needs. Would it be nice to have it more widely available and standardized? Yes. But all else being equal, on purely technical merits, I don’t think HTTP and REST can stand up as a competitor to a good RPC system for programmatic request/reply needs.
josh
Josh, I’ve posted a response to my blog. Your comment system doesn’t seem to allow links, but you can see my response at steve.vinoski.net.
Steve Vinoski
@Josh
Frankly there is nothing much to critique in your blog entry. There are no technical arguments to say why RPC is better while the rest of the industry is moving more towards message oriented middleware. All you have done is to say Facebook and Google is using RPC in the backend. Dude, simply bcos it’s Google or Facebook do you automatically believe that RPC was the best choice? Have you considered how different it may have been if they used a message oriented approach? Do you know the advantages/disadvantages each approach would have yeilded for the given systems. If you do have them please state and we can have a good debate about it. Simply bcos something works does that mean that is the best way to do it?
I think you got the definition of RPC wrong. RPC by definition provides the illusion of calling a routine on a remote address space looks as if it resides in the local address space. As a programmer you write the same code whether the routine is local or remote. If my remoting framework doesn’t hide the network from me, then I am doing something else not RPC.
If only a developers life is this easy. Here are some use cases taken from the thrift white paper
Added field, old client, new server. In this case, the old client does not send the new field. The new server recognizes that the field is not set, and implements default behavior for out-of-date requests.
Removed field, old client, new server. In this case, the old client sends the removed field. The new server simply ignores it.
Added field, new client, old server. The new client sends a field that the old server does not recognize. The old server simply ignores it and processes as normal.
Removed field, new client, old server. This is the most dangerous case, as the old server is unlikely to have suitable default behavior implemented for the missing field. It is recommended that in this situation the new server be rolled out prior to the new clients.
You can see from case 4 that adding a paramter is not as easy as you think. If it was such an easy from problem to solve then why is thrift (which is what Facebook is using and one of the systems that you seem to toot about) is recomending that you need to roll out a new server?
Try telling that to your manager when the system is already in production.
Rajith Attapattu
I don’t care about Cisco or the definition of RPC, but I do still want to convince you that the REST architectural style has something to bring to the table. I’m not convinced that you’ve actually fully understood what REST means. I do hope you’ll read RESTful Web Services by Sam Ruby and Leonard Richardson, because I think you’d actually appreciate quite a bit of what it has to say. Plus, most of the examples are in Ruby!
This is really a response to this comment thread, but I thought I’d converse with you directly rather than via Steve’s blog.
Sure you can. Oh, but other software is available that does REST on HTTP? That’s a bad thing? If someone wrote a pure-Java ICE stack that interoperated with the original, would that make ICE a worse choice, or a better one?
It’s true. REST - even a specific RESTful architecture like Ruby and Richardson’s ROA, on a specific protocol like HTTP - does not constrain you to a single software implementation.
But I’d say that it’s actually the other choice that’s underconstrained. RPC-like middleware frameworks like ICE and Thrift may provide plenty of guidance as to what software needs to run on each end of the connection, but they are completely unconstrained when it comes to the hard questions of desiign and architecture. What should my API look like? Should session state be maintained by the server or the client? How are state changes represented? How does data point to other data, possibly in other services? REST answers all of these. ICE and Thrift leave it up to the developer to design client-server interaction patterns. (At least some of the REST constraints can be followed within an RPC-like system, and should be for any system that has to scale and interoperate).
Not true. HTTP has a wire format, just like the request-reply system I mentioned above does. HTTP’s format is richer in some ways, and poorer in others. Both include extensible metadata. HTTP’s wire format also defines semantics for things like modification and expiry dates, authentication tokens, retry counts, character encoding, redirections to other services, and data fields whose formats are described by MIME types. The proprietary system allows only one format for its data fields, but this data format maps better onto some popular programming language data structures than the common MIME types used with HTTP.
But if you do want to pass simple RPC-like parameters through HTTP without referencing a separately-specified data format, the URI specification gives the standard way to do this for HTTP GET (using URI-encoded parameters in the query string), and the “application/x-www-form-urlencoded” MIME type is a default format for use with HTTP POST. Neither is really that great as serialization formats go, but I’ll freely admit that taking full advantage of HTTP does require the designer to choose an appropriate content-type for the data being transferred. On the other hand, most middleware systems work much more poorly with content types that do not match their native formats.
Sure, if by “write” you mean “download a widely-deployed piece of software like Rails that will generate it for me.” The same is necessary with the main RPC-like framework I’ve worked in, by the way. Anyway, it’s not like any system is just going to let you download third-party software without writing some of your own to handle your specific logic for each request. Fortunately, both HTTP and ICE-style frameworks have implementations that will take care of the boring boilerplate parts like dispatching for you.
(In fact, Apache alone *is* a perfectly good dispatching system for many purposes - I’ve implemented quite a bit of functionality for some applications just using mod_rewrite and CGI.)
I’m not sure why you seem to think this is hand-waving. Every large-scale HTTP service I’ve seen (except the ones that are just tunnels for proprietary request-reply protocols!) makes use of HTTP caching, and most make use of proxies. I personally am responsible for four different services that publish feeds over HTTP, and four different applications that consume those feeds. These eight programs, implemented in three different languages, all see improved latency and decreased bandwidth thanks to their common support for HTTP’s cache expiry and validation.
With every HTTP library I’ve used, caching is built in - no need for separate software. Yes, if you decide to introduce proxies into your architecture then you’ll need to choose one to meet your needs. Again, I’m not sure why you think it’s a bad thing that the client/server framework doesn’t commit you to using just one proxy implementation.
Matt Brubeck
Matt,
You have wonderfully substantive arguments, as always! To address them fully I’ll need to write a full blog entry. There’s just one thing I want to answer at the moment:
I wasn’t trying to claim that there isn’t software that implements RESTful services, proxying, caching, wire formats, etc.; of course there is. I just wanted Steve to pick an actual implementation of one that he thinks fairly represents the REST ideal, so we could compare apples to apples (one request/reply implementation to another) and make objective comparisons like:
- what does “Hello, world!” look like?
- how do you document and support your APIs?
- how does the system handle various failure modes?
- how much work does it take for others to interoperate with you?
- how well can the system support adjusting the caching strategy?
What I wanted to avoid is going down a path of discussion where I would say “REST doesn’t support scenario X very well” and hear a retort like “the REST stack could support a feature that does X perfectly.” I want to talk about reality and not Plato’s world of forms. I want to compare actual implementations and not architectural ideals.
Anyway, I’m on the fence about whether I want to take the time to really thoroughly explain my point of view right now. On one hand, this has been on my mind for a while. On the other hand, this discussion has started to wear on me a bit and I really desperately want to get the next version of Gazelle out the door.
josh
I vote for Gazelle, personally.
Matt Brubeck