The overhead of abstraction in C/C++ vs. Python/Ruby

I’ve been working on some Python and Ruby libraries lately that wrap C extensions. An interesting and important observation came to me as I was doing some of the design.

Please note this is not meant to be any kind of commentary of the relative value between these languages. It’s a specific observation that is useful when you are crossing the language barrier and deciding the boundary between what should go in the high-level language vs what should go in C or C++.

The observation I made is that C and C++ compilers can inline, whereas interpreters for Python and Ruby generally do not.

This may seem like a mundane observation, but what it means is that building abstractions in the high-level language has a noticeable cost, where in C and C++ simple abstractions built around function calls are basically free.

To illustrate, take this Python program:

total = 0

for i in range(1000000):
  total += i

print total

Now suppose we want to abstract this a bit (this is a toy example, but mirrors the structure of real abstractions):

total = 0

class Adder:
  def __init__(self):
    self.total = 0

  def add(self, i):
    self.total += i

adder = Adder()

for i in range(1000000):
  adder.add(i)

print adder.total

On my machine, the second example is less than half the speed of the first. (The same is true of Ruby when I tried equivalent programs).

$ time python test.py
499999500000

real 0m0.158s
user 0m0.133s
sys     0m0.023s
$ time python test2.py
499999500000

real 0m0.396s
user 0m0.367s
sys     0m0.024s

Compare this with the equivalent first program in C++ (I used “volatile” to prevent the compiler from being too smart and collapsing the loop completely):

#include <stdio.h>

int main() {
  volatile long total = 0;

  for (long i = 0; i < 100000000; i++) {
    total += i;
  }

  printf("%ld\n", total);
}

And the version with the adder abstracted into a class:

#include <stdio.h>

class Adder {
 public:
  Adder() : total(0) {}

  void add(long i) { total += i; }

  volatile long total;
};

int main() {
  Adder adder;

  for (long i = 0; i < 100000000; i++) {
    adder.add(i);
  }

  printf("%ld\n", adder.total);
}

On my machine, not only do they take the same amount of time, they compile into literally exactly the same machine code.

We already know that Python and Ruby are noticeably slower than C and C++ (again, not a dig, the two serve different purposes), which suggests that performance-critical code should go in C or C++. But the extra observation here is that any layers or abstractions in Python or Ruby have an inherent cost, whereas in C or C++ you can layer abstractions much more freely without fear of additional overhead, particularly for functions or classes in a single source file.