CHAPTER 01

CHAPTER 01.05: FLOATING POINT REPRESENTATION: Example: Part 2 of 2

So we have calculated -13.9 of the base-10 is approximately equal to minus 1, radix point, 101 base-2 times 2 to the power 011 base-2. And now what we want to be able to see is that we have these eight bits which we have reserved for this floating point, and we want to see that how we're going to place them, and where we're going to place them. So we have one, two, three, four, five, six, seven, eight bits. We already have said that the first bit is assigned to the sign of the number. The second bit is assigned to the sign of the exponent. In the real, actual thirty-two-bit format, with IEEE standards, you don't have a bit assigned to sign of exponent, but in order to keep things simple and to introduce you to floating point representation we are keeping a bit for the sign of the exponent. And then the next three bits we are using for the exponent itself, and the last three bits we are using for the mantissa.

So let's look at how we can fill in these eight-bits with the representation of -13.9. The first bit is for the sign of the number, and since the sign of the number is negative, we're going to put a 1 there, if it were a positive number, we would have put a 0 in there. Now we . . . the second bit is for the sign of the exponent. So the sign of the exponent, again, which you are seeing here, is a positive number, so that will be a 0 here. And then what we have is the exponent itself, the exponent is 011, so I'll put 011 right here, and then I have the mantissa, which will be 101. Again, keep in mind that we don't need to represent this 1 here before the radix point, because that's automatically assumed it to be a non-zero number, and in binary format, the only non-zero number is 1, so that's why we don't represent this 1 in this mantissa here, it's understood that before the radix point, there's a 1 before the radix point in the mantissa. So that's what the representation would be for this -13.9. Now, as I said that this is approximately equal to that number, so let's go ahead and see that, what does it mean that it is approximately equal to that number? How much is the amount of true error which has been caused by representing in this eight-bit representation? So what I want to do is, I want to not go and use this, I'm going to use this to show you how to get back to this, here. So, if somebody gives you eight-bit representation like this one and says that, hey, the first is for the sign of the number, the second one is the sign of the exponent, the third one . . . the next three are for the exponent, and the next three are for the mantissa. What you first need to do is, let's go to the sign of the number, so it's a negative number. Let's then go to the mantissa, so since the mantissa is 101, I'm going to write 101 here, put the radix point here, and put the automatic 1 right here, and this is in the base-2. And then I'm going to look at, multiplied by 2 raised power, the sign of exponent, which is, I have a 0 in there, so it's a positive number, and then I'm going to look at what's in the exponent, which is 011, so I'm going to put 011 base-2, right there. And now I'm going to translate it into base-10, so this is negative 1 times 2 to the power 0, which is this one, plus 1 times 2 to the power -1, which is this one, plus 0 times 2 to the power -2, which is this one, plus 1 times 2 to the power -3, which is this one right here, times 2 raised power 011 here, it will be 0 times 2 to the power 2, plus 1 times 2 to the power 1, plus 1 times 2 to the power 0, because this is 2 to the power 0, this one, this one is 2 to the power 1, and this one is 2 to the power 2, and that's why this is 0, 1, 1 here, like this one. And when I calculate this, this number here, that turns out to be 1.625, so the number which is represented in the mantissa is 1.625 in the base-10, and then times 2 to the power 3, which I am getting there, and that's equal to -13 base-10, that's what it's turning out to be. So what you are finding out is that you are starting with -13.9 to the base-10, representing it in the eight-bit format like it is shown, but the equivalent, what you are actually representing is only -13 part of that -13.9 number.

So if we're going to look at the relative true error, absolute relative true error, which has been created by representing -13.9, which is the exact value, minus the approximate value, which is the representation in the eight-bit format, divided by -13.9, which is the exact value. And this number here turns out to be 0.0647. And now the question arises that how is this related to the machine epsilon? How is this number here related to the machine epsilon? It is because it will be less than 2 to the power minus the number of bits which used for the mantissa, which in this case is 3, right? So you're using three bits for the mantissa, so your machine epsilon is 2 to the power minus the number of bits which are used for mantissa, which is 0.125. So you are finding out that this number is less than the machine epsilon here. So you'll always find out that the numbers which are going to be represented in this eight-bit format, if you look at the relative true error of any of the numbers which can be represented in eight-bit format, the relative true error in the representation will be always less than the machine epsilon, in this case being 0.125. And that is the end of this segment.