Floating-point math has a reputation for being very unpredictable and hard to understand. I think that a major reason for this is that floating-point values are hard to inspect. Take the following program:

#include <stdio.h>

int main() {
  double x = 0.1;
  printf("%f\n", x);
}

Here is the output I get on my system:

0.100000

You might be tempted to think that the variable x is indeed equal to the value 0.1, but don’t be fooled! In fact 0.1 is a rounded version of x’s true value, which will become apparent if we ask for more precision:

#include <stdio.h>

int main() {
  double x = 0.1;
  printf("%.30f\n", x);
}
0.100000000000000005551115123126

It’s hard to understand what we can’t easily inspect. To remedy this, I’ve just written a new tool called dumpfp. Thing of it as the floating-point toString() method you never had. It prints the precise value of the number, in both rational and decimal forms:

$ ./dumpfp 0.1
Single Precision (IEEE 32-bit):
           raw = 0x3dcccccd
          sign = 0x0
      exponent = 0x7b (-4)
   significand = 0x4ccccd

   VALUE CALCULATION =
       significand   (1 + 5033165/2^23  (1.60000002384185791015625))
     * 2^exponent    (2^-4)
     = VALUE         (13421773/2^27  (0.100000001490116119384765625))

Double Precision (IEEE 64-bit):
           raw = 0x3fb999999999999a
          sign = 0x0
      exponent = 0x3fb (-4)
   significand = 0x999999999999a

   VALUE CALCULATION =
       significand   (1 + 1351079888211149/2^51  (1.600000000000000088817841970012523233890533447265625))
     * 2^exponent    (2^-4)
     = VALUE         (3602879701896397/2^55  (0.1000000000000000055511151231257827021181583404541015625))

Notice that the actual value was not exactly 0.1, because binary floating point can only exactly represent rational values with a power-of-two denominator. You can see that the approximation is closer for double than it is for float

The tool first breaks down the raw bytes of the value into its constituent parts. IEEE floating-point values consist of a sign bit, some number of exponent bits, followed by a significand. These values are then combined together according to the expression significand * 2^exponent. The tool will show you all of the intermediate values and the final result

You’ll notice that these numbers have many decimal digits. Not all of these are significant. We can calculate how many digits are significant but I couldn’t think of an easy-to-read way of printing the exact value but also indicating which digits are significant. If anyone has a bright idea here, please do drop me a line (or fork me on GitHub).

The tool’s output is less noisy for a value that can be represented exactly:

$ ./dumpfp 0.5
Single Precision (IEEE 32-bit):
           raw = 0x3f000000
          sign = 0x0
      exponent = 0x7e (-1)
   significand = 0x0

   VALUE CALCULATION =
       significand   (1 + 0/2^0  (1.0))
     * 2^exponent    (2^-1)
     = VALUE         (1/2^1  (0.5))

Double Precision (IEEE 64-bit):
           raw = 0x3fe0000000000000
          sign = 0x0
      exponent = 0x3fe (-1)
   significand = 0x0

   VALUE CALCULATION =
       significand   (1 + 0/2^0  (1.0))
     * 2^exponent    (2^-1)
     = VALUE         (1/2^1  (0.5))

You can also use it for special values like NaN or Infinity:

$ ./dumpfp nan
Single Precision (IEEE 32-bit):
           raw = 0x7fc00000
          sign = 0x0
      exponent = 0xff (NaN or Infinity)
   significand = 400000 (non-zero indicates NaN)

Double Precision (IEEE 64-bit):
           raw = 0x7ff8000000000000
          sign = 0x0
      exponent = 0x7ff (NaN or Infinity)
   significand = 8000000000000 (non-zero indicates NaN)

And for very large values it will print the full integer value:

$ ./dumpfp 1e30
Single Precision (IEEE 32-bit):
           raw = 0x7149f2ca
          sign = 0x0
      exponent = 0xe2 (99)
   significand = 0x49f2ca

   VALUE CALCULATION =
       significand   (1 + 2423141/2^22  (1.5777218341827392578125))
     * 2^exponent    (2^99)
     = VALUE         (1000000015047466219876688855040)

Double Precision (IEEE 64-bit):
           raw = 0x46293e5939a08cea
          sign = 0x0
      exponent = 0x462 (99)
   significand = 0x93e5939a08cea

   VALUE CALCULATION =
       significand   (1 + 1300913865115253/2^51  (1.577721810442023642195863430970348417758941650390625))
     * 2^exponent    (2^99)
     = VALUE         (1000000000000000019884624838656)

I learned a lot by writing this tool, and I hope it helps you understand floating-point better. Floating-point numbers don’t have to be that mysterious: they have very specific values as we can see. The trickiness comes from the fact that values get rounded if they can’t be represented exactly; hopefully this tool will make it clear when a value can be represented or not.

For example, one thing that can run you into trouble is trying to add two numbers where one is much larger than the other. Suppose you wanted to add 1 + 1e-16. The result should be 1.00000000000000001, but can this number be represented in floating-point?

$ ./dumpfp 1.00000000000000001
Single Precision (IEEE 32-bit):
           raw = 0x3f800000
          sign = 0x0
      exponent = 0x7f (0)
   significand = 0x0

   VALUE CALCULATION =
       significand   (1 + 0/2^0  (1.0))
     * 2^exponent    (2^0)
     = VALUE         (1 + 0/2^0  (1.0))

Double Precision (IEEE 64-bit):
           raw = 0x3ff0000000000000
          sign = 0x0
      exponent = 0x3ff (0)
   significand = 0x0

   VALUE CALCULATION =
       significand   (1 + 0/2^0  (1.0))
     * 2^exponent    (2^0)
     = VALUE         (1 + 0/2^0  (1.0))

We see here that the number 1.00000000000000001 can’t be represented in floating-point, and the closest approximation we had available to us was 1.0. We lost the smaller number completely! It’s not that operations like addition are imprecise, it’s that the result of the operation might not be possible to store in a floating-point value, so it rounds to the nearest representable number.