In one of my previous article I mentioned about storing float point number.
So today, I am going to dive in the deeper details and share with you how exactly computer does it.
1. Floating point numbers. Why we need them?
2. Details about number representation.
3. How the converting process happens?
4. Final Result.
Floating point numbers. Why we need them?
Since computer memory is limited, you cannot store numbers with infinite precision, no matter whether you use binary fractions or decimal ones: at some point you have to cut off. Float point numbers is one of the possible way to represent real number so that to keep a tradeoff between range and precision.
What does this mean?
It means that each float number, according to standard IEEE754, can be represented in next form:
Details about number representation.
In general, here is 5 types of floatingpoint representations:
However, we will consider only one of them namely Single precision which allows us to store digits with accuracy of 78 decimal numbers (from to in range).
A little more how the single precision floating point number is organized.
It occupies 32 bits(4 bytes) and provides (1 bit for sign, 8 bits for exponent and 23 for mantissa).
How the converting process happens?
I take some double(let it be 5.125) and will make conversion step by step, to show the whole number transition from decimal to binary format.
Now take a look at 5.125 and define next points:
Sign = 0 (means positive number)
Mantissa = 125 (actually this is the fraction)
Exponent = 2 (power) – you will see later how can we get this
Base will be = 2(binary representation)
So eventually we will be able to see the number in exponential form and to understand how the computer will store it in binary format.
Step1 (conversion of the fractional part)
Since in normalized binary mantissa integer part always equals to 1, so that we will put only fraction part into mantissa.
Consider our 5.125 and take the fractional part = 0.125.
Now we need to convert it into a binary fraction:
 Multiply the fraction by 2
 Get rid of integer part
 Check if new fraction = zero
If NO – remultiply new fraction by 2 (Note: you can repeat until the precision limit is reached 23 fraction digits). If YES – finish.
After following schema above we got the next:
– here is terminate
So 0.125 fraction can be represented in as 0,001
Therefore
Step2 (denormalize number)
It means that we need to represent the number in exponential form. You can read more details here.
In general, you need to shift coma that the number will have such form:
So firstly, we need to make left or right shifting, depends on what we already have.
In our case we have 101.001, so that would be shifted to the right by 2 digits and become . Screen below:
Step3 (find the offsetbite)
Actually, we need to make next:
Offsetbite = 127 + 2 = 129
After converting this to binary we will get 10000001
Final Result.
So what we exactly have? Our number 5.125 looks in exponential form like
this and represented in binary like this:
I hope it was helpful information for you. Feel free to correct me. Will appreciate.

Rav Gupta