Archive

Archive for December, 2009

Torn over the C++ question

December 2nd, 2009 Josh 6 comments

I am having a very difficult time deciding whether to go through with the C++ port of upb or to stay in C.

I’ve ported about one third of upb to C++, on a branch, to see how it would turn out. It was a ton of work. Here are my current observations:

  • The C++ is cleaner, more readable, less error-prone code. It’s just a fact. Compare for yourself (C: upb_def.h, upb_def.c; C++: upb_def.h, upb_def.cc). This is due to numerous factors:
    • type-safe containers means fewer casts.
    • “public” and “private” keywords make it easy to separate the private parts of your interface, without having to specify in comments which is which.
    • namespaces and class scope mean that I don’t have to write out my identifiers like upb_fielddef_dothis(), I can just write DoThis().
    • real inheritance and member classes mean I don’t have to explicitly call all the right constructors/destructors, or write explicit casts for upcasts
    • destructors that are guaranteed to run on scope exit mean I can use RAII patterns like mutexes that automatically unlock when the scope is exited
  • The source got shorter; the portion I ported went from 1483 lines to 1133, or a ~30% reduction.
  • The binary got a LOT bigger. I had one function get literally 5x as big. I haven’t figured out why this happened yet. I used templates to make the table generic, but I was extremely careful to make sure that the template only generated a small amount of code — basically just the hash lookup routine, which is small (note: the hash function for strings was not templated or inlined). But another issue is that the C++ compiler appears to emit multiple copies of the same function in the same object file! For example, I found some virtual destructors emitted literally three times in the same file. Why is this?
  • I just heard back from a security guru from the Google security team, who said that C is often easier to audit than C++ because it’s easier to figure out what is actually going on, without having to dig through layers of abstraction. This surprised me (maybe it shouldn’t have, since Sam Quigley said the same thing in a comment on my last entry), but I was also a little bit relieved.

I’m leaning towards sticking with C, for the following reasons:

  • C++ compilers aren’t very good at keeping things small, even when you are juducious with your use of templates.
  • C++ compilers are much more complicated that C compilers, and therefore not as ubiquitous or as easy to trust generally.
  • C isn’t harder to audit for security than C++, and may actually be easier.

I’ll try to take some of the lessons I learned from my partial C++ port to make the C more readable.

Categories: Uncategorized Tags: