|
CHAPTER 01.05: FLOATING POINT
REPRESENTATION: Accuracy of Floating Point Binary
Numbers: The Machine Epsilon In this segment, we'll talk about the
accuracy of floating-point binary numbers. So, when we represent, let’s
suppose, any number from base 10 into floating-point binary number it's going
to be represented only in a finite number of bits, so it's not necessarily
going to be represented exactly, so what we will figure out how accurate is
that representation. So, let's go and see how do
we determine the accuracy of a floating-point number. The first thing
which we need to realize is that there's something called machine epsilon
which we need to define in order to be able to have this discussion. So, what
machine epsilon is basically is that hey if I have a number called machine
epsilon, what's going to happen is that it has to be
the number which we'll have to add to one in order to make it strictly
greater than one. What that simply implies let's suppose you have one as
machine number in your floating-point representation there is going to be a
next number which will be represented by the floating-point representation
and in order to be able to differentiate between those two numbers you have
to add some number to it and that number will be epsilon mach
or what we call as machine epsilon. Now, this might seem to be a little bit
abstract let's go and see what really what it actually
means. Let's look at an example that's what somebody says that hey I
got 10 bits for representing a number. So, we have this architecture and this
10 bits we are using the first one for the sign of the number the next one
for the sign of the exponent the next three bits for the magnitude of
exponent and the next four for the magnitude of the mantissa, let’s suppose.
So, if that is the case if you look at the representation of one, the
representation of one will be like this one point zero zero
zero base two, maybe another zero here, so one base
ten I'm looking at, times two to the power zero zero
zero zero base two. So,
basically, you'll have these four zeros which will go into the magnitude of
mantissa part four zeros but the magnitude of the exponent part. Now, you
might say, how did I get this? This is part of a previous video and I put
that link in the YouTube description you can go here I'm just not trying to
I'm trying to keep this video as short as possible so that we can concentrate
it on the concept of machine epsilon and the accuracy of floating-point
numbers, but you are more than welcome to look at the previous videos to see
that how we went from here to here. Now, if we look at the next video, I am
sorry, look at the next number, I should say, look at the next number the
next number is here that's the next number because I'm basically putting one
in the last place for the mantissa and keeping everything else as the same.
So, the question is there hey what is this number and that's what's going to
allow me to find out what the machine epsilon is because that's the next
number so what is one point zero zero one base 2
times 2 to the power zero zero zero
zero base 2 is equal to 1 times 2 to the power 0
plus 1 times 2 to the power minus 4 because I get one here that's one times
two to the power 0 no contribution from these three guys right here and then
from this last one this is two to the power of minus 4 because this is two to
the power minus 1 this is two to the power minus 2 this is two to the power
minus 3 so this is two to the power minus 4 times two to the power 0 and this
number here when we do the calculation 1.0625 five base 10. So, it is the
difference between this number and this number right here which is the
machine epsilon because that's the next number which is available to me in
the in the format which I'm calculating. So, the machine epsilon will be one
point zero six two five, right? Minus one and that number will give me zero
point zero six two five base 10 will be my machine epsilon. Now, there's another way to find the machine
epsilon what you can do is you can find machine epsilon by this definition as
well it's two to the power minus the number of bits used for mantissa. So, if
you know the number of bits which is in the mantissa, you don't have to go
through this whole procedure in order to find machine epsilon I went through
the procedure because I wanted to show you hey what is the next number
available after one so you can just directly calculate it but this thing
since we have four bits for the mantissa for this particular architecture,
which we talked about, you will get zero point zero six two five. So, that's
how we calculate the machine epsilon so now you might say hey this is very
good that we are able to find out the machine epsilon, so what? What is the
purpose of machine epsilon and that's something which we'll do through an
example in the next slide. So, what is the
significance of machine epsilon for a student especially you know
introductory course in numerical methods, a person especially who's not at a
computer science major, what we are trying to do is that whenever we are
going to represent the number in floating-point number format it is not going
to be exact. So, in most cases. So, let's suppose we have a number X and it
gets represented as Y because it's not in represent exactly then what machine
epsilon is basically telling us that the relative true error this is the
relative true errors, guys, this is the absolute relative true error because
what is this? This is the exact number minus the approximate number divided
by the exact number as the definition of the relative true error absolute
relative true error. And, you always find it to be less than machine epsilon
so that's the duty of the understanding about the machine epsilon in
business. So, if you look at this little example it's
saying that hey you got a ten bit binary word, first bit is for the sign of
the number, second one is for the sign of the exponent, next four bits are
for the magnitude of the exponent, last four bits are for the magnitude of
the mantissa. So, I have one, two, then we'll have a group of four bits,
another group of four bits. So, this is for the sign of number this is for
sign of exponent this one is for magnitude of exponent and this one is for
magnitude of mantissa. And what we are doing is we are representing zero
point zero two eight three two and we are trying to figure out that how is
this going to be represented in this particular 10-bit format right here and
it's not going to be represented exactly so we want to be able to see what is
the difference between what it is and what is gonna
be represented by and then check whether the relative true error the absolute
relative true error is it less than the Machine Epsilon? So, in this case
what we want to do is we are going to be skipping some steps, mainly because
it has been already done in the previous lessons and we want to keep this
lesson to be short, but, if you go to the description on the YouTube channel,
you'll be able to see the complete playlist as well as where the other videos
are. So, here what we are going to find out that this can be represented in
the binary format with five zeros after the radix point and then one one one zero zero like that and keep in mind that there are several
bits after this so, in fact, the number of bits that would take to represent
this number right here will be infinite in the fixed point format as well as
in the floating-point format as you will see, but, in order to be able to
represent this in the floating-point format, we’ve got to move this radix
point after the first non-zero number here which is one here so this will be
approximately one point one one zero zero base two times two to the power minus six base ten.
The reason why it's approximate is because I only took care of only these
bits right here one one one
zero zero here, but there are many other bits after
that. So, now, this will turn out to be one point one one
zero zero base two times two to the power minus
zero one one zero base two. Because that's six
converted to one one zero is this six converted to
base 2 is one one zero but opening a zero here
because we have four bits for the magnitude of the exponent and the reason
why we took only four bits for the showing the only four bits of the mantissa
is because we have only four bits for the magnitude of the mantissa right
here. So, that's what it turns out to the equivalent of this number. So, we
know that this number here is not same as this number and what do we want to
be able to see is the hey what is the difference between those two numbers?
So, what we have is we have one point one one zero zero base two times two to the power minus zero one one zero base two. If we want to convert it to base ten
this will be the following number. So, here we have one which is coming from
here and then after the radix point we have nonzero numbers right here at the
two to the power minus one plus two to the power minus two place and then, of
course, we already know that this is six in base ten as seen previously so
this number here turns out to be one point seven five base ten times two to
the power minus 6 and this number here turns out to be zero point zero two
seven three four three seven five base 10. Now, of course, what you are
finding out here is that this number is not the same and as the number which
we started with we started with this number here and it gets represented as,
I should say approximately here, it gets represented as zero point zero two
seven three four three seven five. So, there is a difference between what the
actual number was and what we get as the number which is represented. So, in
this case we won't look at what is the true error absolute relative true
error will be the exact value which is this minus the approximate value
divided by the exact value, and this number here turns out to be equal to
zero point zero three four four seven two. Now, you can see that this is less than the Machine Epsilon and the Machine Epsilon is nothing but two to the power minus four, which is zero point zero six two five. So, what you are finding out here is that the number represented it got represented approximately, and when I wanted to calculate what the relative true error will be on it, it turns out to be a certain number and it'll turn out to be strictly less than the Machine Epsilon of that particular architecture. So, any number which can be representing this format will always have a relative true error which will be less than point zero six two five, so, you have confidence in the how the number is getting represented that hey no matter what number it is I don't have to worry about what would be the maximum possible relative true error in representation, it will be less than point zero six two five in this case. And that's what the beauty of the machine epsilon is. And that's the end of this segment. |