In one of my previous article I mentioned about storing float point number.
So today, I am going to dive in the deeper details and share with you how exactly computer does it.

1. Floating point numbers. Why we need them?
3. How the converting process happens?
4. Final Result.

Floating point numbers. Why we need them?

Since computer memory is limited, you cannot store numbers with infinite precision, no matter whether you use binary fractions or decimal ones: at some point you have to cut off. Float point numbers is one of the possible way to represent real number so that to keep a trade-off between range and precision.

What does this mean?
It means that each float number, according to standard IEEE754, can be represented in next form:

In general, here is 5 types of floating-point representations:

However, we will consider only one of them namely Single precision which allows us to store digits with accuracy of 7-8 decimal numbers (from $Latex formula$ to $Latex formula$ in range).
A little more how the single precision floating point number is organized.

It occupies 32 bits(4 bytes) and provides (1 bit for sign, 8 bits for exponent and 23 for mantissa).

How the converting process happens?

I take some double(let it be 5.125) and will make conversion step by step, to show the whole number transition from decimal to binary format.

Now take a look at 5.125 and define next points:
Sign = 0 (means positive number)
Mantissa = 125 (actually this is the fraction)
Exponent = 2 (power) – you will see later how can we get this
Base will be = 2(binary representation)

So eventually we will be able to see the number in exponential form and to understand how the computer will store it in binary format.

Step1 (conversion of the fractional part)

Since in normalized binary mantissa integer part always equals to 1, so that we will put only fraction part into mantissa.
Consider our 5.125 and take the fractional part = 0.125.

Now we need to convert it into a binary fraction:

• Multiply the fraction by 2
• Get rid of integer part
• Check if new fraction = zero
If NO – re-multiply new fraction by 2 (Note: you can repeat until the precision limit is reached 23 fraction digits). If YES – finish.

After following schema above we got the next:
$Latex formula$
$Latex formula$
$Latex formula$ – here is terminate
So 0.125 fraction can be represented in as 0,001
Therefore $Latex formula$

Step2 (de-normalize number)

It means that we need to represent the number in exponential form. You can read more details here.
In general, you need to shift coma that the number will have such form:

So firstly, we need to make left or right shifting, depends on what we already have.

In our case we have 101.001, so that would be shifted to the right by 2 digits and become $Latex formula$. Screen below:

Step3 (find the offset-bite)
Actually, we need to make next:

Offset-bite = 127 + 2 = 129
After converting this to binary we will get 10000001

Final Result.

So what we exactly have? Our number 5.125 looks in exponential form like
this $Latex formula$ and represented in binary like this:

I hope it was helpful information for you. Feel free to correct me. Will appreciate.