Kuebiko: Floating point arithmetic in 21 century

Friday, September 8, 2017

Floating point arithmetic in 21 century

I really hope that entire concept will be dropped one day. Current approach was useful 30 years ago, now we have much more memory and we can do it in a much better way. And yet when I look at JavaScript:


0.1 + 0.2 === 0.3  //false 

(0.1 + 0.2) + 0.3 === 0.1 + (0.2 + 0.3)  //false

It is all beacause of 0.1 representation in IEEE standard.

s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 01111011 10011001100110011001101
           |  ||  ||  ||  ||  || +- 8388608
           |  ||  ||  ||  ||  |+--- 2097152
           |  ||  ||  ||  ||  +---- 1048576
           |  ||  ||  ||  |+-------  131072
           |  ||  ||  ||  +--------   65536
           |  ||  ||  |+-----------    8192
           |  ||  ||  +------------    4096
           |  ||  |+---------------     512
           |  ||  +----------------     256
           |  |+-------------------      32
           |  +--------------------      16
           +-----------------------       2

The sign is positive, that's pretty easy.

The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2-4 or 1/16.

The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2n) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.

When you add all these up, you get 1.60000002384185791015625.

When you multiply that by the multiplier, you get 0.100000001490116119384765625, which is why they say you cannot represent 0.1 exactly as an IEEE754 float.

Friday, September 8, 2017

Floating point arithmetic in 21 century

No comments: