CHAPTER 01.05: FLOATING POINT REPRESENTATION: Accuracy of Floating Point Binary Numbers: The Machine Epsilon

 

In this segment, we'll talk about the accuracy of floating-point binary numbers. So, when we represent, let’s suppose, any number from base 10 into floating-point binary number it's going to be represented only in a finite number of bits, so it's not necessarily going to be represented exactly, so what we will figure out how accurate is that representation.

 

So, let's go and see how do we determine the accuracy of a floating-point number. The first thing which we need to realize is that there's something called machine epsilon which we need to define in order to be able to have this discussion. So, what machine epsilon is basically is that hey if I have a number called machine epsilon, what's going to happen is that it has to be the number which we'll have to add to one in order to make it strictly greater than one. What that simply implies let's suppose you have one as machine number in your floating-point representation there is going to be a next number which will be represented by the floating-point representation and in order to be able to differentiate between those two numbers you have to add some number to it and that number will be epsilon mach or what we call as machine epsilon.

 

Now, this might seem to be a little bit abstract let's go and see what really what it actually means. Let's look at an example that's what somebody says that hey I got 10 bits for representing a number. So, we have this architecture and this 10 bits we are using the first one for the sign of the number the next one for the sign of the exponent the next three bits for the magnitude of exponent and the next four for the magnitude of the mantissa, let’s suppose. So, if that is the case if you look at the representation of one, the representation of one will be like this one point zero zero zero base two, maybe another zero here, so one base ten I'm looking at, times two to the power zero zero zero zero base two. So, basically, you'll have these four zeros which will go into the magnitude of mantissa part four zeros but the magnitude of the exponent part. Now, you might say, how did I get this? This is part of a previous video and I put that link in the YouTube description you can go here I'm just not trying to I'm trying to keep this video as short as possible so that we can concentrate it on the concept of machine epsilon and the accuracy of floating-point numbers, but you are more than welcome to look at the previous videos to see that how we went from here to here. Now, if we look at the next video, I am sorry, look at the next number, I should say, look at the next number the next number is here that's the next number because I'm basically putting one in the last place for the mantissa and keeping everything else as the same. So, the question is there hey what is this number and that's what's going to allow me to find out what the machine epsilon is because that's the next number so what is one point zero zero one base 2 times 2 to the power zero zero zero zero base 2 is equal to 1 times 2 to the power 0 plus 1 times 2 to the power minus 4 because I get one here that's one times two to the power 0 no contribution from these three guys right here and then from this last one this is two to the power of minus 4 because this is two to the power minus 1 this is two to the power minus 2 this is two to the power minus 3 so this is two to the power minus 4 times two to the power 0 and this number here when we do the calculation 1.0625 five base 10. So, it is the difference between this number and this number right here which is the machine epsilon because that's the next number which is available to me in the in the format which I'm calculating. So, the machine epsilon will be one point zero six two five, right? Minus one and that number will give me zero point zero six two five base 10 will be my machine epsilon.

 

Now, there's another way to find the machine epsilon what you can do is you can find machine epsilon by this definition as well it's two to the power minus the number of bits used for mantissa. So, if you know the number of bits which is in the mantissa, you don't have to go through this whole procedure in order to find machine epsilon I went through the procedure because I wanted to show you hey what is the next number available after one so you can just directly calculate it but this thing since we have four bits for the mantissa for this particular architecture, which we talked about, you will get zero point zero six two five. So, that's how we calculate the machine epsilon so now you might say hey this is very good that we are able to find out the machine epsilon, so what? What is the purpose of machine epsilon and that's something which we'll do through an example in the next slide. So, what is the significance of machine epsilon for a student especially you know introductory course in numerical methods, a person especially who's not at a computer science major, what we are trying to do is that whenever we are going to represent the number in floating-point number format it is not going to be exact. So, in most cases. So, let's suppose we have a number X and it gets represented as Y because it's not in represent exactly then what machine epsilon is basically telling us that the relative true error this is the relative true errors, guys, this is the absolute relative true error because what is this? This is the exact number minus the approximate number divided by the exact number as the definition of the relative true error absolute relative true error. And, you always find it to be less than machine epsilon so that's the duty of the understanding about the machine epsilon in business.

 

So, if you look at this little example it's saying that hey you got a ten bit binary word, first bit is for the sign of the number, second one is for the sign of the exponent, next four bits are for the magnitude of the exponent, last four bits are for the magnitude of the mantissa. So, I have one, two, then we'll have a group of four bits, another group of four bits. So, this is for the sign of number this is for sign of exponent this one is for magnitude of exponent and this one is for magnitude of mantissa. And what we are doing is we are representing zero point zero two eight three two and we are trying to figure out that how is this going to be represented in this particular 10-bit format right here and it's not going to be represented exactly so we want to be able to see what is the difference between what it is and what is gonna be represented by and then check whether the relative true error the absolute relative true error is it less than the Machine Epsilon? So, in this case what we want to do is we are going to be skipping some steps, mainly because it has been already done in the previous lessons and we want to keep this lesson to be short, but, if you go to the description on the YouTube channel, you'll be able to see the complete playlist as well as where the other videos are. So, here what we are going to find out that this can be represented in the binary format with five zeros after the radix point and then one one one zero zero like that and keep in mind that there are several bits after this so, in fact, the number of bits that would take to represent this number right here will be infinite in the fixed point format as well as in the floating-point format as you will see, but, in order to be able to represent this in the floating-point format, we’ve got to move this radix point after the first non-zero number here which is one here so this will be approximately one point one one zero zero base two times two to the power minus six base ten. The reason why it's approximate is because I only took care of only these bits right here one one one zero zero here, but there are many other bits after that. So, now, this will turn out to be one point one one zero zero base two times two to the power minus zero one one zero base two. Because that's six converted to one one zero is this six converted to base 2 is one one zero but opening a zero here because we have four bits for the magnitude of the exponent and the reason why we took only four bits for the showing the only four bits of the mantissa is because we have only four bits for the magnitude of the mantissa right here. So, that's what it turns out to the equivalent of this number. So, we know that this number here is not same as this number and what do we want to be able to see is the hey what is the difference between those two numbers? So, what we have is we have one point one one zero zero base two times two to the power minus zero one one zero base two. If we want to convert it to base ten this will be the following number. So, here we have one which is coming from here and then after the radix point we have nonzero numbers right here at the two to the power minus one plus two to the power minus two place and then, of course, we already know that this is six in base ten as seen previously so this number here turns out to be one point seven five base ten times two to the power minus 6 and this number here turns out to be zero point zero two seven three four three seven five base 10. Now, of course, what you are finding out here is that this number is not the same and as the number which we started with we started with this number here and it gets represented as, I should say approximately here, it gets represented as zero point zero two seven three four three seven five. So, there is a difference between what the actual number was and what we get as the number which is represented. So, in this case we won't look at what is the true error absolute relative true error will be the exact value which is this minus the approximate value divided by the exact value, and this number here turns out to be equal to zero point zero three four four seven two.

 

Now, you can see that this is less than the Machine Epsilon and the Machine Epsilon is nothing but two to the power minus four, which is zero point zero six two five. So, what you are finding out here is that the number represented it got represented approximately, and when I wanted to calculate what the relative true error will be on it, it turns out to be a certain number and it'll turn out to be strictly less than the Machine Epsilon of that particular architecture. So, any number which can be representing this format will always have a relative true error which will be less than point zero six two five, so, you have confidence in the how the number is getting represented that hey no matter what number it is I don't have to worry about what would be the maximum possible relative true error in representation, it will be less than point zero six two five in this case. And that's what the beauty of the machine epsilon is. And that's the end of this segment.