Deriving The Linear Regression Formula

                In this segment, we'll derive the general straight line regression model. So, the formulation of the problem is that, given x1 y1, all the way up to x and y n. so; we're giving n data points. What we want to be able to do, is best fit y is equal to a naught plus a1 x to the data. So, that's what we're trying to do.

                So, if we had, let's suppose some points given to us as y as a function of x, what we want to be able to do is we want to be able to draw a straight line, which will best fit all the data points which are given to us. So, assuming, let's suppose this point is x sub i comma y sub i, then we know that this line we are writing as y is equal to a naught plus a1 x. That means that if I choose this point right here, that coordinate will be x sub i, because the same x value, but the value of y there will be a0 plus a1 x sub i. So, this vertical distance which you are seeing here, what we are doing is by using the least squares method, is to take all these vertical distances which we have, and we are trying to take the square of these vertical distances, and trying to minimize them, the sum of them. So, if you look at the residual which you will have here, which is e sub i will be the observed value, minus the predicted value. So, that is the, that is this distance right here. And the observed value is y sub i, and the predicted value is a0 plus a1 x sub i. And what we're going to do, is when we say sum of the squares the residuals, so we are taking all these residuals squaring them up and then adding them up. What we are basically doing, is that we want the sum of the square the residuals, or the sum of square the residuals will turn out to be as follows. And we can expand this to write it in this particular form. So, what do we, what do we want to do? We want to be able to minimize this value of Sr in order to be able to find the best fit of the, of the curve. So, if this is sum of the square of the residuals, Sr, how do we go about finding out the minimum of this quantity right here? That is the goal.

                So, the sum of the square the residuals is given by this expression now, right? And what we want to do is we want to make this sum of the square residuals, or Sr, to be as small as possible. Make it as small as possible, which means that we're trying to minimize it, absolute, find the absolute minimum of that. And we can find with respect to a naught and a1 because y sub i is a fixed x sub i is fixed, so the only thing which is under control is a naught and a1. So, how do we go about doing this? So, let's do the first derivative part here, so the first derivative part of finding the absolute minimum Sr would be as follows: that hey, let's go and take the derivative of Sr respect to a naught, and that'll be summation i is equal to 1 to n. And we can apply the chain rule here, we get 2 y sub i minus a naught minus a1 x i, multiplied by the derivative of minus a naught which is minus 1. with respect to a naught. Same thing we'll have del Sr by del a1. Again, we apply the chain rule we get 2 times y sub i minus a naught minus a1 x i multiplied by, hey what is the derivative of what's in the parenthesis here? Which would be, with respect to a1, will be just minus x sub i and we want to put this equal to zero and we want to put this equal to zero. And we want to be able to find out hey, what will be those values of a naught and a1 so that the derivative with respect to a naught and derivative with respect to a1, the partial derivatives will be equal to zero. Let's go and look at this equation. So, what we're going to do is we're going to expand it. We're going to get as follows. So, I’m basically taking the summation and breaking it up into three separate summations. I’m going to do the same thing here; I’m going to get. So, the, this one will be i is equal to 1 to n x sub i, y sub i there because we have x of y minus x y here. So, don't forget about that, that's my plus 2 summation i is equal to 1 to n a naught x sub i and same thing something similar here. So, what are you basically seeing that hey, we're somehow ending up with two equations and two unknowns, and we're going to simplify it a little bit further to find out hey, what do we get? So, you can very well see that you have 2 here, 2 here, 2 here, so we can divide by 2. Same thing here, we have 2 here, and 2 here, and 2 here. We can divide it by 2. So, let’s go ahead and do that.

                So, I’m going to divide both equations by 2, and if I do that this is what I’m going to get. I’m going to get a summation of y terms here, summation of a naught here from 1 to n, and then a summation of a term which has a1 in it, then the next equation I’ll get something like this. So, basically what I have is two equations two unknowns and I want to be able to find out, hey, what are a naught and a1? So, these are two known quantities, so I’m going to move them to the right-hand side because that's what we do when we set up simultaneous linear equations, we keep the unknowns on the left-hand side and the knowns to the right-hand side. But what is this? This is a naught being added to itself n times, so that will give me n times a naught, right? And then I’m going to take a1 outside from here, and inside I’ll get this, and summation y sub i will go to the right-hand side, so that’s equation 1. Now, let's look at second equation: I’m going to have this so I’m going to take a naught outside and x sub i is getting summed here, and then a1 outside and I’ll have x sub i squared being summed in the summation, but this quantity here will be moved to the right-hand side because that's a known quantity. So, now we can clearly see that I have two equations two unknowns, and I should be able to solve them. So, I’m not going to go through the algebra but basically what I would do is I’ll multiply equation 1 by summation of x sub i, and equation 2 by n and subtract. And why would I do that? Because that will allow me to get rid of a naught terms. So, if I take this equation 1, multiply by the summation of x sub i, I take the second term, second equation, and I multiplied by n what's going to happen is that the a naught terms will turn out to be the same. So, I’m going to subtract the two. So, when I subtract the two, this is what I’m going to get. I’m getting a1 is equal to n times summation i is equal to 1 to n, x sub i y sub i minus the summation of x values, summation of y values: that's the numerator. And the denominator, what I will get is I’ll get something like this. So, once I found a1, how do I find a naught? I can find a naught by going back to equation one, because I found a1 which is right here, and I can plug it back into equation 1 and I can find out what a naught is. So, let's go ahead and do that.

                So, if you look at equation 1, what did we have equation one? We had n times a naught plus a1 summation of x sub i equal to summation of y values. And so, since I already know a1, I can find a naught, so this is what I get by simplifying or by taking nodes to the right-hand side. Keep in mind that this is a known quantity now, so that's why I was able to move it to the right-hand side. So, I get something like this, and you can recognize hey, this is nothing but the average value of y, and this is nothing but the average value of x. So, I get y bar minus a1 x bar. So, that'll be a naught. Okay, and just recall a1 will be this quantity right here, that is the numerator, and the denominator is this quantity right here. So, we can find a1, and then can find a naught, and that's what gives us the best fit model. But that's not the end of it because the only thing which we have done is the first derivative. Taking the first derivative, partial derivatives with respect to a naught and a1 of Sr and put that equal to 0 and we got a single solution to the problem. But this, what does it mean? It means that the critical point is that a naught equal to this quantity and a1 equal to this quantity, it does not tell us that hey, whether it's an absolute minimum. The only thing which is telling us that hey, this critical point can be a local minimum or a local maximum, a saddle point or maybe even none of those. Let's go and see how do we convince ourselves that this also corresponds to an absolute minimum.

                The way it is the absolute minimum is because the second derivative test. When you conduct it, we're not showing it in here because that's a little bit beyond the scope of the course. The second derivative test tells us that a naught and a1 values found, they correspond to a local minimum. So, we still don't know that yeah, it's a local minimum but is it an absolute minimum? How do we say that hey, it's an absolute minimum? It's because of these: Sr is a continuous function of a naught and a1. So, that's one and the second one is that del Sr by del a naught equal to 0, the first derivative part which we did, only yields one solution. It did not give us multiple solutions; it gave us only one solution. So, based on this that the Sr is continuous function of a naught and a1 that the first derivative being put equal to 0 only yielded one solution. That implies that a naught and a1 values found correspond to absolute, to an absolute minimum. So, the values of a naught and a1 which we found, they not only correspond to local minimum, but they also correspond to an absolute minimum because Sr is continuous function of a naught and a1, and that by putting the first derivative equal to zero, we got only one solution. And that's the end of the derivation of the linear regression model, and it is the end of the segment.