4 minute read

I’ve recently encountered few people who were naive about the representations of real (i.e not integers) numbers in computer languages , like double and float in Java (and their counterparts in other languages).

Since examples are worth much more then explanations, lets looks at this:

double y = 0.082;
double z = 8.2;

System.out.println("" + y +" * 100 = " + z + "? " +
                        (100 * y == z) + " it equals " + (100 * y) );

and the output is:
0.082 * 100 = 8.2? false it equals 8.200000000000001

The next example shows exactly why you should be wary of floating points. Look at the binary representation after each iteration.

public static void main(String[] args) {
  Double x = 1.0 / 3.0;
  for (int i=0; i= 1) {
    System.out.println("Iteration: " + i + " x = " + x);
    System.out.println("Binary representation: " + 
             Long.toBinaryString(Double.doubleToRawLongBits(x)));
    x = x * 2;
    if (x >= 1) {
      x = x - 1;
    }    
  }
}
> Iteration: 0 x = 0.3333333333333333  
> Binary representation: 11111111010101010101010101010101010101010101010101010101010101  
> Iteration: 1 x = 0.6666666666666666  
> Binary representation: 11111111100101010101010101010101010101010101010101010101010101  
> Iteration: 2 x = 0.33333333333333326  
> Binary representation: 11111111010101010101010101010101010101010101010101010101010100  
> Iteration: 3 x = 0.6666666666666665  
> Binary representation: 11111111100101010101010101010101010101010101010101010101010100  
> Iteration: 4 x = 0.33333333333333304  
> Binary representation: 11111111010101010101010101010101010101010101010101010101010000  
> Iteration: 5 x = 0.6666666666666661  
> Binary representation: 11111111100101010101010101010101010101010101010101010101010000  
> Iteration: 6 x = 0.33333333333333215  
> Binary representation: 11111111010101010101010101010101010101010101010101010101000000  
> Iteration: 7 x = 0.6666666666666643  
> Binary representation: 11111111100101010101010101010101010101010101010101010101000000  
> Iteration: 8 x = 0.3333333333333286  
> Binary representation: 11111111010101010101010101010101010101010101010101010100000000  
> Iteration: 9 x = 0.6666666666666572  
> Binary representation: 11111111100101010101010101010101010101010101010101010100000000  
> Iteration: 10 x = 0.3333333333333144  
> Binary representation: 11111111010101010101010101010101010101010101010101010000000000  
> Iteration: 11 x = 0.6666666666666288  
> Binary representation: 11111111100101010101010101010101010101010101010101010000000000  
> Iteration: 12 x = 0.33333333333325754  
> Binary representation: 11111111010101010101010101010101010101010101010101000000000000  
> Iteration: 13 x = 0.6666666666665151  
> Binary representation: 11111111100101010101010101010101010101010101010101000000000000  
> Iteration: 14 x = 0.33333333333303017  
> Binary representation: 11111111010101010101010101010101010101010101010100000000000000  
> Iteration: 15 x = 0.6666666666660603  
> Binary representation: 11111111100101010101010101010101010101010101010100000000000000  
> Iteration: 16 x = 0.3333333333321207  
> Binary representation: 11111111010101010101010101010101010101010101010000000000000000  
> Iteration: 17 x = 0.6666666666642413  
> Binary representation: 11111111100101010101010101010101010101010101010000000000000000  
> Iteration: 18 x = 0.3333333333284827  
> Binary representation: 11111111010101010101010101010101010101010101000000000000000000  
> Iteration: 19 x = 0.6666666666569654  
> Binary representation: 11111111100101010101010101010101010101010101000000000000000000  
> Iteration: 20 x = 0.3333333333139308  
> Binary representation: 11111111010101010101010101010101010101010100000000000000000000  
> Iteration: 21 x = 0.6666666666278616  
> Binary representation: 11111111100101010101010101010101010101010100000000000000000000  
> Iteration: 22 x = 0.3333333332557231  
> Binary representation: 11111111010101010101010101010101010101010000000000000000000000  
> Iteration: 23 x = 0.6666666665114462  
> Binary representation: 11111111100101010101010101010101010101010000000000000000000000  
> Iteration: 24 x = 0.3333333330228925  
> Binary representation: 11111111010101010101010101010101010101000000000000000000000000  
> Iteration: 25 x = 0.666666666045785  
> Binary representation: 11111111100101010101010101010101010101000000000000000000000000  
> Iteration: 26 x = 0.3333333320915699  
> Binary representation: 11111111010101010101010101010101010100000000000000000000000000  
> Iteration: 27 x = 0.6666666641831398  
> Binary representation: 11111111100101010101010101010101010100000000000000000000000000  
> Iteration: 28 x = 0.3333333283662796  
> Binary representation: 11111111010101010101010101010101010000000000000000000000000000  
> Iteration: 29 x = 0.6666666567325592  
> Binary representation: 11111111100101010101010101010101010000000000000000000000000000  
> Iteration: 30 x = 0.3333333134651184  
> Binary representation: 11111111010101010101010101010101000000000000000000000000000000  
> Iteration: 31 x = 0.6666666269302368  
> Binary representation: 11111111100101010101010101010101000000000000000000000000000000  
> Iteration: 32 x = 0.33333325386047363  
> Binary representation: 11111111010101010101010101010100000000000000000000000000000000  
> Iteration: 33 x = 0.6666665077209473  
> Binary representation: 11111111100101010101010101010100000000000000000000000000000000  
> Iteration: 34 x = 0.33333301544189453  
> Binary representation: 11111111010101010101010101010000000000000000000000000000000000  
> Iteration: 35 x = 0.6666660308837891  
> Binary representation: 11111111100101010101010101010000000000000000000000000000000000  
> Iteration: 36 x = 0.3333320617675781  
> Binary representation: 11111111010101010101010101000000000000000000000000000000000000  
> Iteration: 37 x = 0.6666641235351562  
> Binary representation: 11111111100101010101010101000000000000000000000000000000000000  
> Iteration: 38 x = 0.3333282470703125  
> Binary representation: 11111111010101010101010100000000000000000000000000000000000000  
> Iteration: 39 x = 0.666656494140625  
> Binary representation: 11111111100101010101010100000000000000000000000000000000000000  
> Iteration: 40 x = 0.33331298828125  
> Binary representation: 11111111010101010101010000000000000000000000000000000000000000  
> Iteration: 41 x = 0.6666259765625  
> Binary representation: 11111111100101010101010000000000000000000000000000000000000000  
> Iteration: 42 x = 0.333251953125  
> Binary representation: 11111111010101010101000000000000000000000000000000000000000000  
> Iteration: 43 x = 0.66650390625  
> Binary representation: 11111111100101010101000000000000000000000000000000000000000000  
> Iteration: 44 x = 0.3330078125  
> Binary representation: 11111111010101010100000000000000000000000000000000000000000000  
> Iteration: 45 x = 0.666015625  
> Binary representation: 11111111100101010100000000000000000000000000000000000000000000  
> Iteration: 46 x = 0.33203125  
> Binary representation: 11111111010101010000000000000000000000000000000000000000000000  
> Iteration: 47 x = 0.6640625  
> Binary representation: 11111111100101010000000000000000000000000000000000000000000000  
> Iteration: 48 x = 0.328125  
> Binary representation: 11111111010101000000000000000000000000000000000000000000000000  
> Iteration: 49 x = 0.65625  
> Binary representation: 11111111100101000000000000000000000000000000000000000000000000  
> Iteration: 50 x = 0.3125  
> Binary representation: 11111111010100000000000000000000000000000000000000000000000000  
> Iteration: 51 x = 0.625  
> Binary representation: 11111111100100000000000000000000000000000000000000000000000000  
> Iteration: 52 x = 0.25  
> Binary representation: 11111111010000000000000000000000000000000000000000000000000000  
> Iteration: 53 x = 0.5  
> Binary representation: 11111111100000000000000000000000000000000000000000000000000000  
> Iteration: 54 x = 0.0  
> Binary representation: 0

Why does it happens?

Because if you use 64 bits to store a floating point number, you use 1 for the sign (+/-), 11 for the mantissa, and 52 for the value itself. And after adding 52 trailing zeroes …

In each deduction we remove the msb (most significan bit) and add a trailing 0 as a lsb (least significant bit) - and that’s - in a nutshell - what kills our precision. Naturally, there are ways to get over it, but you should be aware that you need to use these methods.

You can read more about it here:

http://kipirvine.com/asm/workbook/floating_tut.htm

http://support.microsoft.com/kb/42980

http://en.wikipedia.org/wiki/Single-precision_floating-point_format