CHAPTER 01.05: FLOATING POINT REPRESENTATION: Example: Part 1 of 2
In this segment, we're going to take an example of floating point representation. So, we're going to take a hypothetical example, because in a regular single-precision number you'll have to use . . . people use thirty-two bits. So if you're using thirty-two bits, and then you have certain bits for . . . for the sign of the number, then you'll have certain bits for the exponent of the number, and certain bits for the mantissa of the number, but since we are talking about thirty-two bits, that might be too many bits to write down on the board. So that's why I'm going to take a hypothetical word for your floating point representation, but you should understand how things are really stored in real life, in single-precision you'll be using thirty-two bits to have a single-precision number. So let's go ahead and look at this, so what I want to do is I'm going to take an eight-bit word. So what we are assuming is that we have a floating point representation which is going to use eight bits. And I'm going to distribute now these bits, so I have one, two, three, four, five, six, seven, and eight.
And this, for example, is, let's suppose, used for the sign of the number, and this one is used for sign of exponent. Generally, there's no bit used for the sign of exponent in real life, because they use a biased exponent, we'll take that example in another segment. But to keep things simple, to be able to understand what floating point representation is all about, we are using one bit for the sign of the exponent. And then I'm going to use three bits . . . I'm going to use three bits, this is one, two, and three bits for the exponent itself, and the rest of the three, I'm going to use for the . . . for the mantissa. So I have an eight-bit word, the first bit is going to be used for the sign of the number, the second bit is going to be used for the sign of the exponent, the next three are going to be used for the exponent itself, and the next three are going to be used for the mantissa, it makes a total of eight. So if somebody tells me, hey, go ahead and put -13.9 to the base-10 into the floating point. Can you represent this number, -13.9, into our floating point representation, when you are given that your eight-bit word, first bit is used for the sign of the number, second bit is used for the sign of the exponent, next three for the exponent itself, and next three for the mantissa. Take this number, -13.9 in the base-10, and put it in the floating point representation. So let's go ahead and see how we can go about doing that. So we have -13.9 to the base-10, that's our number which we want to be able to converting to our floating point representation, right? So, the first thing which we have to do is that we have to convert 13 into base-2, and 13 in base-2 is 1101, and I'm going to leave this as homework, because we already talked about binary representations. So go ahead and see that how 13 is same as 1101 base-2. The same thing, 0.9, in the base-10, will be 0, radix point, 11100 base-2, and this will be an approximation, because, again, as we learned in binary representation of 0.9 cannot be represented exactly, but this is the . . . these are the first five digits which you have of 0s and 1s after the radix point for the representation of 0.9. So, this again, I'm going to leave this as your homework. So once you have done these two pieces of homework, what I can write down is that -13.9 to the base-10 is nothing but 1101, then the radix point, 11100 base-2, and this is an approximation, of course. And then the minus sign right here because I have a minus sign right there.
Now, in order to be able to write down the floating point representation, what I have to do is I have to say -1.10111100 base-2. So what I've done here is that I have to move this radix point all the way up to here, because I have to have the first . . . the number before the radix point has to be a 1. So in order to be able to do that, it can be only 1, nothing else, it cannot be 0, it cannot be more than 1, so it is just 1, so that means I have to move the radix point all the way three places to the left here, so that's why I get 1, radix point, 101, from here, and then the three 1s which are right here, and so on and so forth, multiplied by 2 to the power 3, the reason why it's 2 to the power 3 is because I have moved the radix point three places to the left here. So if that is the case, I'm going to, again, approximate it by -1.101 base-2 times 2 to the power 3. And the reason why I'm doing that is because I can only . . . I only have three places for the mantissa, so I'm going to, rather than writing down the whole number, or the longer version of the number, I'm going to just say it's approximately -1, radix point, 101, and forget about these 1s, because I cannot represent them anyway. In the mantissa, I only have places for three bits, so it'll be the bit after the radix point, 101. This is automatically taken into consideration because every floating point representation will have a 1 before the radix point, so I don't need to store that, so 101 will be stored in the mantissa. Now I already know that the sign bit of the number is negative, so Iíll have a 1 in the sign bit, I'll have 101 in the mantissa, but I've still got to figure out what is the sign of the exponent, and what is the value of the exponent. Now I already know the sign 2 raised power 3, the sign of the exponent is positive, so I'll have a 0 in the sign of the exponent. So I've got to figure out what to do with this 3 now. The 3, again, I know that is 11 base-2. And since I have three bits in the exponent I'm going to put as 011 base-2. Again, I'm going to leave this as homework, for you to show me that why 3 is 11 base-2. Now why you've got to put a 0 here is because I have three places for my exponent, so since it is 11 base-2, in order to be able to fill up the third space, I'll put the 0 in the front, which does not make any difference for the number equivalent, which is still 3, so I'll have 011 there. So let me go ahead and write down the whole number now in the base-2 format, and that will make it very clear that what numbers go where.