The overhead of abstraction in C/C++ vs. Python/Ruby
I’ve been working on some Python and Ruby libraries lately that wrap C extensions. An interesting and important observation came to me as I was doing some of the design.
Please note this is not meant to be any kind of commentary of the relative value between these languages. It’s a specific observation that is useful when you are crossing the language barrier and deciding the boundary between what should go in the high-level language vs what should go in C or C++.
The observation I made is that C and C++ compilers can inline, whereas interpreters for Python and Ruby generally do not.
This may seem like a mundane observation, but what it means is that building abstractions in the high-level language has a noticeable cost, where in C and C++ simple abstractions built around function calls are basically free.
To illustrate, take this Python program:
total = 0
for i in range(1000000):
total += i
print total
Now suppose we want to abstract this a bit (this is a toy example, but mirrors the structure of real abstractions):
total = 0
class Adder:
def __init__(self):
self.total = 0
def add(self, i):
self.total += i
adder = Adder()
for i in range(1000000):
adder.add(i)
print adder.total
On my machine, the second example is less than half the speed of the first. (The same is true of Ruby when I tried equivalent programs).
$ time python test.py
499999500000
real 0m0.158s
user 0m0.133s
sys 0m0.023s
$ time python test2.py
499999500000
real 0m0.396s
user 0m0.367s
sys 0m0.024s
Compare this with the equivalent first program in C++ (I used “volatile” to prevent the compiler from being too smart and collapsing the loop completely):
#include <stdio.h>
int main() {
volatile long total = 0;
for (long i = 0; i < 100000000; i++) {
total += i;
}
printf("%ld\n", total);
}
And the version with the adder abstracted into a class:
#include <stdio.h>
class Adder {
public:
Adder() : total(0) {}
void add(long i) { total += i; }
volatile long total;
};
int main() {
Adder adder;
for (long i = 0; i < 100000000; i++) {
adder.add(i);
}
printf("%ld\n", adder.total);
}
On my machine, not only do they take the same amount of time, they compile into literally exactly the same machine code.
We already know that Python and Ruby are noticeably slower than C and C++ (again, not a dig, the two serve different purposes), which suggests that performance-critical code should go in C or C++. But the extra observation here is that any layers or abstractions in Python or Ruby have an inherent cost, whereas in C or C++ you can layer abstractions much more freely without fear of additional overhead, particularly for functions or classes in a single source file.