Making Knuth's wish come true: the x32 ABI
Several years ago (though I can’t say exactly how many since it’s not dated) Knuth made the following complaint:
**A Flame About 64-bit Pointers** It is absolutely idiotic to have 64-bit pointers when I compile a program that uses less than 4 gigabytes of RAM. When such pointer values appear inside a struct, they not only waste half the memory, they effectively throw away half of the cache. The gcc manpage advertises an option "-mlong32" that sounds like what I want. Namely, I think it would compile code for my x86-64 architecture, taking advantage of the extra registers etc., but it would also know that my program is going to live inside a 32-bit virtual address space. Unfortunately, the -mlong32 option was introduced only for MIPS computers, years ago. Nobody has yet adopted such conventions for today's most popular architecture. Probably that happens because programs compiled with this convention will need to be loaded with a special version of libc. Please, somebody, make that possible.
I always thought this made a lot of sense. People have asked distro-makers for this before without a lot of success, but it looks like this is now being worked on by high-profile people in the Linux community. It is called The x32 ABI (see the LWN coverage for a more digestible description). It’s exciting because in some benchmarks this can outperform the x86-64 ABI by 10% or more. It’s a tradeoff – if you don’t need to address more than 4GB of memory, you can get faster programs because smaller pointers have better cache utilization. You’ll use less memory too.
This could have been done in a way that operated nearly the same as
“compatibility mode” (ie. running 32-bit binaries on a 64-bit CPU/OS),
which would have required only minimal changes to the
kernel/toolchain. But it looks like their plans are more ambitious:
they want to be able to use the optimized SYSCALL64
instruction
(which is “much faster” than int 0x80
according
to H. Peter Anvin), and they’re looking at fixing other problems
like 32-bit time_t
. So it’s a more substantial effort, but it looks
like there’s significant interest and momentum behind this.
Thinking about how this would affect upb, my impression is that I could use my x86-64 JIT unmodified with x32, since it appears to have all of the same calling conventions. It has the same set of callee-save registers and the same set of registers for parameter transfer, and I think these are the main things upb’s JIT-ted code depends on.