CHAPTER 01

CHAPTER 01.05: FLOATING POINT REPRESENTATION: IEEE-754 Single Precision Representation: Part 2 of 2

So let's go ahead and look at an example.

So let's suppose somebody puts these numbers into the thirty-two bits, says, hey, I got a 1 in my sign bit, and then I got 10100010 in my biased exponent, and then in the mantissa I have a 1 here, a 0 here, a 1 here, then all the rest of them are 0, I intentionally put the rest of them to be 0, because I didn't want to show you all the algebra corresponding to that, that would have taken quite a bit. So, that's why I'm showing only 101 in the . . . in the first three bits there. Now, what value does this represent? This value represents -1 raised power s, and s, since s is 1, because the first bit, the sign bit is 1, so it becomes a negative number. This 1 is already assumed. Now, what is the mantissa? The mantissa is 101 with all 0s, so let me just cross these off, because otherwise I'll have to put twenty more 0s there. So, because there are . . . all of the rest of the bits are 0, so this is 1.101 base-2. And then the biased exponent, I have these eight bits, 10100010, so I'm going to put those bits right here, and I'm, since it is a biased exponent, I'm going to subtract 127 from it. The translation into the base-10 number for the exponent, biased exponent is 162, I'm going to subtract 127 from it, I get 2 raised power 35. This particular number here, which is 1.101 turns out to be 1.625. And when I do all these multiplications and everything, this is the number which is being represented now by that . . . by those bits in the thirty-two bit single-precision representation. Now, somebody might say, hey, can you go ahead and do the translation in reverse, where you have the number which is given to you in the base-10, and can you convert it into . . . into base . . . into the base-2 with this thirty-two bit representation? Now, again, what you've got to understand is that I can directly see that, hey, I won't have -1 raised to the power 1, that this bit here is going to be 1, because it's a negative number. And then, what I have to do is I have to figure out, hey, what will be the part of the mantissa, and what will be the part of the exponent, and also whether the exponent will be negative or positive. So, depending on that, then I will convert that into a biased exponent to be able to take care of positive and negative exponents. I'm not going to show you how to do this, I'm going to leave this as homework. And this can be also a good program to use . . . program to develop, to see that whether you can be able to . . . somebody gives you a number between the smallest and the largest number which they can represent in thirty-two bits, and asks you to find out what the mantissa would be, what the sign of the exponent would be, and what the biased exponent would be, so that I'm going to leave it as homework. Now, going back to the biased exponent which we are talking about is that eight bits which we have represent numbers between 0 and 255. Now, what I can do is, I can . . . the bias is 127, so you subtract 127 from it so as to show that what is the real exponent which is being represented, it's from -127 to 128. But that's not the end of the story. The reason why that's not the end of the story, it is because, actually, the range of the biased exponent is not from 0 to 255, but it's only from 1 to 254, so somehow something is happening to the biased exponent value of 0 and the biased exponent value of 255. And then the actual range for the actual exponent will be 127 subtracted from here, and we'll get -126 and 127 as the range of the exponent.

Now why do we not have 0 and 255? Because what they do is they represent special numbers, and one of those special numbers for which those things are reserved are as follows. When we have a biased exponent of e prime equal to 0, that means that we have all 0s. When we have e prime equal to 255, we have all 1s. And those all 0s and all 1s stand for certain special numbers, because we know that we cannot represent 0 exactly, because we have 1, radix point, mantissa in all of our numbers. So we'll always have 1 before the radix point, so we cannot represent 0 exactly. So the way to represent 0 exactly is to choose e prime to be all 0s, that is e prime equal to 0, sign bit being 0, and the mantissa has all 0s in it, and it represents 0. Now if you change the sign bit to 1, it represents -0, these are two distinct numbers which somehow are represented by using these special values of the exponent. Now, e prime equal to 255 represents this, all 0s . . . so when you . . . all 1s, when you have all 1s, e prime is 255, and, depending on what we have in the sign of the number, and what we have in the . . . what we have in the mantissa, that will tell us that, hey, what numbers do they represent? Now, as an example, if the sign bit is 0, you have all 1s in the exponent, all 0s in the mantissa, that corresponds to the number infinity. If you have 1 in the sign bit, you still have all 1s in the exponent and all 0s in the mantissa, it represents the number minus infinity. Now, in order to be able to represent something called not a number, an example of not a number is something indeterminate, like 0 by 0 is considered to be not a number, because you cannot say it's infinity or minus infinity if it is zero, so what . . . the representation for that is that you have a 0 or a 1 in the sign, you can . . . either one is good, but you put all 1s in the exponent, and this mantissa has to be nonzero. That means that at least one of the elements in the mantissa has to be nonzero, and that represents not a number. Now, in the same format, in order to see that what is the maximum and the smallest number by magnitude which we can represent with thirty-two bits is that, since we said 1 is always there before the radix point, and you put all 1s in the mantissa, it's going to make the number bigger, then you have 2 raised power the largest exponent is 127, because we don't use 128. Same thing, the smallest number is that we have to have 1 before the radix point, so we're going to choose all 0s in the mantissa, and then we have 2 raised power -126, because that would make the number as small as possible. So that's the smallest number which we will have by magnitude in the thirty-two bit word, and this is the maximum number which we can represent by magnitude in the thirty-two bit representation. Machine epsilon, which is the smallest number which can be detected by adding 1 to it is 2 raised power -23, and the reason why that is so is because if you look at the number 1.00000, something like that in the . . . let's suppose these are the bits in the mantissa, the next bit which you will have in the mantissa will be 1 in the twenty-third place, and with the exponent, of course, being 0. So this is equal to 1 if the exponent is 0, so this is 1.001, that's the next number which is represented. What does this represent? This represents 2 raised power -23, so 2 raised power -23 is the machine epsilon. Machine epsilon is also a measure of the relative error which you're going to get in representing numbers. So all numbers which are represented in single-precision, thirty-two bits will have a relative error of less than 2 to the power -23, so it gives you a good measure of how accurately your computer is representing your numbers. And that's the end of this segment.