GCC: the impressive and the disappointing

In my work on upb I’ve looked at a lot of compiler-generated assembly code. I frequently want to know how GCC will compile a certain block of code, so I’ll write a little test program in C and use objdump to look at the object file.

Over two years of doing this, I’ve had many moments where I am pleasantly surprised at how smart the compiler is being. C compilers can figure out a lot. I demonstrated a few examples of this on stackoverflow.com – see here and here.

But once in a while I am disappointed, because GCC doesn’t figure something out that I really wish it could. For example, I have a situation where I have a callback interface, but one of the parameters to my callback is something that a lot of clients don’t need. For convenience, I want to let them register a callback that doesn’t take the final parameter. But to be ANSI C, I can’t cast between the two callback types, I need to store the two possible callback types in a union. My quick test looked like this:

// My two callback types -- the second takes an extra
parameter.
typedef union {
  void (*f1)(int);
  void (*f2)(int, int);
} funcs;

// I store "which" separately to track which type of callback was
registered.
void foo(funcs f, int which) {
  int a = 5, b = 10;
  if (which) {
    f.f1(a);
  } else {
    f.f2(a, b);
  }
}

Now on x86-64, parameters are passed in registers (thank god!), so there’s no actual reason to branch here. You might as well just always put both values in registers, because if we were calling “f1” it will just ignore the register that’s holding the second parameter, or overwrite it which is fine too. The only reason we put the “if” in the C code was to be ANSI C compilant – according to the standard you can’t cast between function pointer types.

But alas, GCC didn’t figure this out:

0000000000000000 <foo>:
   0:	85 f6                	test   esi,esi
   2:	48 89 f8             	mov    rax,rdi
   5:	75 11                	jne    18 <foo+0x18>
   7:	be 0a 00 00 00       	mov    esi,0xa
   c:	bf 05 00 00 00       	mov    edi,0x5
  11:	ff e0                	jmp    rax
  13:	0f 1f 44 00 00       	nop    DWORD PTR [rax+rax*1+0x0]
  18:	bf 05 00 00 00       	mov    edi,0x5
  1d:	ff e0                	jmp    rax

Notice it put a branch in there (“jne”) just to avoid putting the value 10 (0xa) in register esi for the one-argument path.

It’s even worse if I give both functions one parameter, but of different types:

typedef union {
  void (*f1)(int);
  void (*f2)(long);
} funcs;

void foo(funcs f, int which) {
  int a = 5;
  if (which) {
    f.f1(a);
  } else {
    f.f2(a);
  }
}

In this case, GCC again generates a branch, but both paths have identical code in them!

0000000000000000 <foo>:
   0:	85 f6                	test   esi,esi
   2:	48 89 f8             	mov    rax,rdi
   5:	75 09                	jne    10 <foo+0x10>
   7:	bf 05 00 00 00       	mov    edi,0x5
   c:	ff e0                	jmp    rax
   e:	66 90                	xchg   ax,ax
  10:	bf 05 00 00 00       	mov    edi,0x5
  15:	ff e0                	jmp    rax

Both branches simply move 5 into edi and jump to rax. There is absolutely no reason to branch here. Sigh.

C compilers are smart, but have their limits. Another thing this demonstrates is how programming in C can be a bit constraining compared to assembly language. The only reason I’m jumping through these hoops to begin with is that C has very strict rules about pointer conversion: you can’t just go around casting one function pointer type to another, because you’ll get undefined behavior. But if you’re programming in assembly there is no undefined behavior, no worrying about aliasing, etc. The code I’m trying to get C to generate in a standards-compliant way would be trivial to write in assembly language directly.

Of course in that case I’d have to implement the assembly language on every platform I wanted to target. In the end I’ll probably use the branchy version that the compiler will generate; the branch will probably predict pretty well, and more importantly for the real fast path for protobuf decoding I have a JIT on the way…