Deriving The Linear Regression Formula
|
In this
segment, we'll derive the general straight line regression model. So, the
formulation of the problem is that, given x1 y1, all the way up to x and y n.
so; we're giving n data points. What we want to be able to do, is best fit y
is equal to a naught plus a1 x to the data. So, that's what we're trying to
do. So, if we
had, let's suppose some points given to us as y as a function of x, what we
want to be able to do is we want to be able to draw a straight line, which
will best fit all the data points which are given to us. So, assuming, let's
suppose this point is x sub i comma y sub i, then we know that this line we are writing as y is
equal to a naught plus a1 x. That means that if I choose this point right
here, that coordinate will be x sub i, because the
same x value, but the value of y there will be a0 plus a1 x sub i. So, this vertical distance which you are seeing here,
what we are doing is by using the least squares method, is to take all these
vertical distances which we have, and we are trying to take the square of
these vertical distances, and trying to minimize them, the sum of them. So,
if you look at the residual which you will have here, which is e sub i will be the observed value, minus the predicted value. So,
that is the, that is this distance right here. And the observed value is y
sub i, and the predicted value is a0 plus a1 x sub i. And what we're going to do, is when we say sum of the
squares the residuals, so we are taking all these residuals squaring them up
and then adding them up. What we are basically doing, is that we want the sum
of the square the residuals, or the sum of square the residuals will turn out
to be as follows. And we can expand this to write it in this particular form.
So, what do we, what do we want to do? We want to be able to minimize this
value of Sr in order to be able to find the best fit of the, of the curve. So,
if this is sum of the square of the residuals, Sr, how do we go about finding
out the minimum of this quantity right here? That is the goal. So, the sum
of the square the residuals is given by this expression now, right? And what
we want to do is we want to make this sum of the square residuals, or Sr, to
be as small as possible. Make it as small as possible, which means that we're
trying to minimize it, absolute, find the absolute minimum of that. And we
can find with respect to a naught and a1 because y sub i
is a fixed x sub i is fixed, so the only thing
which is under control is a naught and a1. So, how do we go about doing this?
So, let's do the first derivative part here, so the first derivative part of
finding the absolute minimum Sr would be as follows: that hey, let's go and
take the derivative of Sr respect to a naught, and that'll be summation i is equal to 1 to n. And we
can apply the chain rule here, we get 2 y sub i
minus a naught minus a1 x i, multiplied by the
derivative of minus a naught which is minus 1. with respect to a naught. Same
thing we'll have del Sr by del a1. Again, we apply the chain rule we get 2
times y sub i minus a naught minus a1 x i multiplied by, hey what is the derivative of what's in
the parenthesis here? Which would be, with respect to a1, will be just minus
x sub i and we want to put this equal to zero and
we want to put this equal to zero. And we want to be able to find out hey, what
will be those values of a naught and a1 so that the derivative with respect
to a naught and derivative with respect to a1, the partial derivatives will
be equal to zero. Let's go and look at this equation. So, what we're going to
do is we're going to expand it. We're going to get as follows. So, I’m
basically taking the summation and breaking it up into three separate
summations. I’m going to do the same thing here; I’m going to get. So, the,
this one will be i is equal to 1 to n x sub i, y sub i there because we have x of y minus x y here. So, don't
forget about that, that's my plus 2 summation i is
equal to 1 to n a naught x sub i and same thing
something similar here. So, what are you basically seeing that hey, we're
somehow ending up with two equations and two unknowns, and we're going to
simplify it a little bit further to find out hey, what do we get? So, you can
very well see that you have 2 here, 2 here, 2 here, so we can divide by 2.
Same thing here, we have 2 here, and 2 here, and 2 here. We can divide it by
2. So, let’s go ahead and do that. So, I’m
going to divide both equations by 2, and if I do that this is what I’m going
to get. I’m going to get a summation of y terms here, summation of a naught
here from 1 to n, and then a summation of a term which has a1 in it, then the
next equation I’ll get something like this. So, basically what I have is two
equations two unknowns and I want to be able to find out, hey, what are a
naught and a1? So, these are two known quantities, so I’m going to move them
to the right-hand side because that's what we do when we set up simultaneous
linear equations, we keep the unknowns on the left-hand side and the knowns
to the right-hand side. But what is this? This is a naught being added to itself
n times, so that will give me n times a naught, right? And then I’m going to
take a1 outside from here, and inside I’ll get this, and summation y sub i will go to the right-hand side, so that’s equation 1. Now,
let's look at second equation: I’m going to have this so I’m going to take a
naught outside and x sub i is getting summed here,
and then a1 outside and I’ll have x sub i squared
being summed in the summation, but this quantity here will be moved to the
right-hand side because that's a known quantity. So, now we can clearly see
that I have two equations two unknowns, and I should be able to solve them. So,
I’m not going to go through the algebra but basically what I would do is I’ll
multiply equation 1 by summation of x sub i, and
equation 2 by n and subtract. And why would I do that? Because that will
allow me to get rid of a naught terms. So, if I take
this equation 1, multiply by the summation of x sub i,
I take the second term, second equation, and I multiplied by n what's going
to happen is that the a naught terms will turn out
to be the same. So, I’m going to subtract the two. So, when I subtract the
two, this is what I’m going to get. I’m getting a1 is equal to n times
summation i is equal to 1 to n,
x sub i y sub i minus the
summation of x values, summation of y values: that's the numerator. And the
denominator, what I will get is I’ll get something like this. So, once I
found a1, how do I find a naught? I can find a naught by going back to
equation one, because I found a1 which is right here, and I can plug it back
into equation 1 and I can find out what a naught is. So, let's go ahead and
do that. So, if you
look at equation 1, what did we have equation one? We had n times a naught
plus a1 summation of x sub i equal to summation of
y values. And so, since I already know a1, I can find a naught, so this is
what I get by simplifying or by taking nodes to the right-hand side. Keep in
mind that this is a known quantity now, so that's why I was able to move it
to the right-hand side. So, I get something like this, and you can recognize
hey, this is nothing but the average value of y, and this is nothing but the
average value of x. So, I get y bar minus a1 x bar. So, that'll be a naught. Okay,
and just recall a1 will be this quantity right here, that is the numerator,
and the denominator is this quantity right here. So, we can find a1, and then
can find a naught, and that's what gives us the best fit model. But that's
not the end of it because the only thing which we have done is the first
derivative. Taking the first derivative, partial derivatives with respect to
a naught and a1 of Sr and put that equal to 0 and we got a single solution to
the problem. But this, what does it mean? It means that the critical point is
that a naught equal to this quantity and a1 equal to this quantity, it does
not tell us that hey, whether it's an absolute minimum. The only thing which is telling us that hey, this critical point
can be a local minimum or a local maximum, a saddle point or maybe even none
of those. Let's go and see how do we convince
ourselves that this also corresponds to an absolute minimum. The way it
is the absolute minimum is because the second derivative test. When you
conduct it, we're not showing it in here because that's a little bit beyond
the scope of the course. The second derivative test tells us that a naught
and a1 values found, they correspond to a local minimum. So, we still don't
know that yeah, it's a local minimum but is it an absolute minimum? How do we
say that hey, it's an absolute minimum? It's because of these: Sr is a
continuous function of a naught and a1. So, that's one and the second one is
that del Sr by del a naught equal to 0, the first derivative part which we
did, only yields one solution. It did not give us multiple solutions; it gave
us only one solution. So, based on this that the Sr is continuous function of
a naught and a1 that the first derivative being put equal to 0 only yielded
one solution. That implies that a naught and a1 values found correspond to
absolute, to an absolute minimum. So, the values of a naught and a1 which we
found, they not only correspond to local minimum, but they also correspond to
an absolute minimum because Sr is continuous function of a naught and a1, and
that by putting the first derivative equal to zero, we got only one solution.
And that's the end of the derivation of the linear regression model, and it
is the end of the segment. |