C/C++ Gripe #1: integer types

C is a great language. When someone asks “what would you change about C?” it is not easy for me to think of something that I think it just plain got wrong. And while C++ is large and complicated, I generally feel that, for what it is trying to be it is pretty well done too.

I wanted to preface this entry with that, because the word “gripe” could be taken to mean that I am a C or C++ hater. Not the case; they are my favorite languages that I use regularly. But with the benefit of hindsight, I think it’s worth mentioning a few of their design choices here and there that make life difficult, where a genuinely better alternative exists. Which brings me to this entry.

C’s integer types are well-explained in the Wikipedia entry C data types. The types char, short, int, long, long long and their unsigned equivalents are defined without specifying their exact size, but instead according to a loose set of rules which, in my experience, are rarely useful.

Until C99 there was no standard way of declaring integers of a specified size (ie. a 16-bit signed integer). You could declare a signed short; this is guaranteed to be at least 16 bits, but it could be larger. The lack of fixed-width types led to every project reinventing the same typedefs for these, over and over. You can have wxInt32 from wxWidgets, or qint32 from Qt or gint32 from glib. Almost every C or C++ library would eventually find itself defining these same typedefs. But thankfully in C99 we got stdint.h which gives us fixed width types like int32_t in the standard library! Problem solved, no?

Well, not quite unfortunately. Since int32_t is just a typedef, there can be multiple primitive types that are 32 bits wide. For example, in the ILP32 programming model, both int and long are 32 bits. So it’s totally arbitrary which of these typedefs is in stdint.h:

// Both of these are equally valid to have in stdint.h:
typedef int int32_t;
typedef long int32_t;

And the really unfortunate part is that int and long are still distinct, incompatible types, which means that this program won’t compile:

// From library A:
typedef int a_int32_t;

// Passes a pointer to a function that takes a 32-bit integer.
void regfunc(void (*f)(a_int32_t));

// From library B:
typedef long b_int32_t;

void my_callback(b_int32_t x) { /* ... */ }

int main() {
  regfunc(&my_callback);
}

This code will fail to compile because void mycallback(long) is not compatible with void f(int) even though both long and int are 32 bits! And in this case both were used through fixed-width typedefs, so the code looks like it should work.

What would have been better is if the primitive types had been defined in terms of the fixed-width types (int32_t, uint32_t, etc). Then, if desired, the more loosely-defined types like int, long, etc. could be the typedefs. If things were defined this way, then you would never run into this problem where two integer types are the same size, yet are incompatible.

A possible idea for improving the status quo would be to make primitive types compatible if they are the same size. That would make it legal to convert between the two function pointer types, and would probably require basically no work in real-world compilers to enable.

However that doesn’t solve a similar problem in C++, which happens when you partially specialize on a_int32_t (for example) only to find that your partial specialization doesn’t apply to b_int32_t. Fixing this is not as easy, because some users could have code that depends on these two types having different specializations, even though they are the same size.

In closing: if you invent a new language, please make the primitive integer types fixed-width. Your users will thank you.