In one of my previous article I mentioned about storing float point number.

So today, I am going to dive in the deeper details and share with you how exactly computer does it.

1. Floating point numbers. Why we need them?

2. Details about number representation.

3. How the converting process happens?

4. Final Result.

## Floating point numbers. Why we need them?

Since computer memory is limited, you cannot store numbers with infinite precision, no matter whether you use binary fractions or decimal ones: at some point you have to cut off. Float point numbers is one of the possible way to represent real number so that to keep a trade-off between range and precision.

What does this mean?

It means that each float number, according to standard IEEE754, can be represented in next form:

## Details about number representation.

In general, here is 5 types of floating-point representations:

However, we will consider only one of them namely **Single precision** which allows us to store digits with accuracy of 7-8 decimal numbers (from to in range).

A little more how the **single precision** floating point number is organized.

It occupies **32 bits(4 bytes)** and provides (**1 bit** for **sign**, **8 bits** for **exponent** and **23** for **mantissa**).

## How the converting process happens?

I take some double(let it be **5.125**) and will make conversion step by step, to show the whole number transition from decimal to binary format.

Now take a look at **5.125** and define next points:

**Sign** = 0 (means positive number)

**Mantissa** = 125 (actually this is the fraction)

**Exponent** = 2 (power) – you will see later how can we get this

**Base** will be = 2(binary representation)

So eventually we will be able to see the number in **exponential form** and to understand how the computer will store it in **binary format**.

**Step1** (**conversion** of the fractional part)

Since in normalized binary mantissa **integer part** always equals to **1**, so that we will put only **fraction part** into mantissa.

Consider our **5.125** and take the **fractional** part = **0.125**.

Now we need to convert it into a binary fraction:

- Multiply the fraction by 2
- Get rid of integer part
- Check if new fraction =
**zero**

If**NO**– re-multiply new fraction by 2 (**Note:**you can repeat until the precision limit is reached 23 fraction digits). If**YES – finish**.

After following schema above we got the next:

– here is terminate

So **0.125 fraction** can be represented in as **0,001**

Therefore

**Step2** (**de-normalize** number)

It means that we need to represent the number in exponential form. You can read more details here.

In general, you need to **shift** coma that the number will have such form:

So firstly, we need to make **left** or **right** shifting, depends on what we already have.

In our case we have ** 101.001**, so that would be shifted to the **right** by **2** digits and become . Screen below:

**Step3** (find the **offset-bite**)

Actually, we need to make next:

**Offset-bite** = 127 + 2 = 129

After converting this to binary we will get **10000001**

## Final Result.

So what we exactly have? Our number **5.125** looks in exponential form like

this and represented in binary like this:

I hope it was helpful information for you. Feel free to correct me. Will appreciate.