|
CHAPTER 01.05: FLOATING POINT
REPRESENTATION: Floating Point Example So
what we have is nine bits for the hypothetical identical floating point
numbers. First bit is for the sign of the number, second bit is for the sign
of the exponent, next three bits are for the magnitude of the exponent and
the last four bits are for the magnitude of the mantissa. So
what we want to be able to do is to be able to take this number eleven point
eight base ten and write it in this floating point format which follows that
convention. In order to be able to do that the first
thing which we have to do is to be able to see is that hey how we can write
eleven in base two and how can we write zero point eight in base two. So eleven in base 10 to base 2 is 1011 base 2. You can do
this as home work because the previous vidoe covers that already. Zero point
eight base ten will be zero radix point one one
zero zero one and keeps on going base two and you
can also do this as homework as it was covered in the previous video. So if
we want to see eleven point eight base ten written as base two number then it
is one zero one one that is equivalent of eleven
then radix point and then we will have the equivalence of bit of point eight
in base ten which is one one zero zero one to the base two that we just showed. So once we
have that what we want to have is we have to take this radix point and move
it here because we only want one non zero digit before the radix point so
this is one radix point zero one one one one zero zero one, base two times two to the power three. The
reason why two to the power three now is because the radix point was moved to
the left by three places. So what we are going to do
is do this in two stages. We want to first see that how many bits we want to
take of the mantissa since there are only four bits for the mantissa. We can
only use these first four bits one zero one one and base two and we
are going to forget about these because these cannot be represented because
we have only four bits for the mantissa and with two to the power three. Now
two to the power three needs to be the three part needs to be written in base
two so three will be one one base two. So this three which we have in base ten can be given as
one one base two. But again we want to
make a small change we want to say that this multiply by two to the power
zero one one base two. The reason why we are doing
zero one one rather than one one
base two is because we have three bits available for the magnitude of the
exponent. So let’s
repeat this. We have eleven point eight base ten is
now written as one point zero one one one base two times two to the power zero one one base two. And now what we will do is we have to
assign it to the nine bits of this hypothetical floating point
and we want to see how we are going to go about doing that. So here were the nine bits. This is for the sign of the
number, this is for the sign of the exponent, these three are for the
magnitude of the exponent and these last four are for the magnitude of the
mantissa. So what we are going to do is we are going
to start now filling in these places for the bits with zeros and ones. The
sign of the number is positive so we will put zero here. The sign of the
exponent is positive so we want to put zero there.
The magnitude of the exponent is zero one one and
then we have these four bits to be put in the magnitude of the mantissa zero
one one one. We don’t
take care of this because this is already assumed because in order to put a
non-zero digit before the radix point you need a non-zero number and the only
non-zero number which is real one in binary format is one so we don’t
need to represent it. It’s there but we don’t
need to represent it because it will always be one. So
this is the representation of the number eleven point eight base ten in this
hypothetical nine bit floating point representation. Now if somebody says that hey this is the
representation how would you write this number is base two you say okay what
I want to do is I want to first say just plus because of the fact that the
sign of the number is positive then I am going to write one then dot then
what I am going to do is I am going to write the four digits of the mantissa
zero one one one one base two times to the power then I will write three
bits of the magnitude of the exponent zero one one
base two and then I have the sign of the exponent which is positive plus. So
go and see what this is equivalent to in base ten and you will see that this
is not equivalent to eleven point eight in base ten so that difference
between this number and this number will tell you what the round off error is
caused by using this hypothetical nine bit word for our floating point
representation and that is the end of this segment. |