CHAPTER 06.03: LINEAR REGRESSION: Derivation: Part 2 of 2
So in order to be able to do that, I will have to take the derivative of this Sr with respect to a0. So del Sr respect to a0 will be equal to summation, 2 times yi, minus a0, minus a1 xi, times -1 equal to 0. You are simply using the chain rule, so it's 2 times whatever is in the brackets, multiplied by the derivative of this expression with respect to a0, it will be just -1, and then del Sr divided by . . . with respect to a1, summation, i is equal to 1 to n, summation is equal to 1 to n, so it'll be 2 times yi, minus a0, minus a1 xi, times the derivative of this particular quantity here, which will be minus . . . minus xi equal to 0, because what you are doing is that you are finding out what the derivative of this quantity in the bracket is with respect to a1, and that'll be just minus xi. So we have set up two equations, two unknowns, so we need to be able to figure it out, how we're going to solve these. So let's suppose if I call this 1, and then call this to be 2. So 1, what does it give me? It gives me . . . let me just expand this summation, expand this summation separately. I'll get -2 summation, yi, i is equal to 1 to n, plus 2 summation, a0, i is equal to 1 to n, plus 2 summation, a1 xi equal to 0, that's what I get from the first equation. Now what do I get from the second equation? Second equation, again, I get -2 summation, i is equal to 1 to n, x times . . . xi times yi, because I have yi here, and xi here, so that's why I get that, then plus 2 summation, a0 xi, plus 2 summation, i is equal to 1 to n, I'll get a1 times xi squared, because I'll have xi here, xi here, and that's what I will get from there. Now, I can see that 2 is common in here, so I can get rid of 2 here, I can divide both sides by 2, that's what I am doing, I'm going to divide both sides by 2, and now, since I see that here, this is a known quantity here, and this is also a known quantity here, but a0 and a1 are not . . . so a1 is right here, a0 and a1 are not known, a0 is not known here, a1 is not known there, a0 is not known there, a1 is there, so I'm going to keep those on the left-hand side of my equations, and this I will transfer to the right-hand side of the equations. So what I will get from there will be, the first equation will turn out to be i is equal to 1 to n, a0, then the second equation will turn out to be i is equal to 1 to n, a1 xi equal to 0, and the second equation, so that's our first equation, the second . . . not 0, it'll be summation of yi. The second equation will be summation, i is equal to 1 to n, a0 xi, plus summation, i is equal to 1 to n, a1 xi squared equal to summation, i is equal to 1 to n, xi yi. So again now, I've got to realize that a0 and a1 are now constants, because I have found those by taking the derivative of the sum of the square of residuals with respect to a0 and a1, so I can take those out, and you can realize what is this? This is adding a0 n times, because you're doing a0, plus a0, plus a0, plus a0, so you're going to get simply this to be n times a0. So the first equation, again, can be written as n times a0, plus, I I can take this a1 outside, because it is a constant, so I can take a1 outside, so I'm going to write it like that, and then summation, i is equal to 1 to n, yi. Then again, here, I can take a0 outside, so I'm going to take a0 outside, I get that, and then, again, a1 can be taken outside, because it's a constant now, i is equal to 1 to n, xi squared, and them summation, i is equal to 1 to n, xi yi. So basically I have two equations, two unknowns now, and I'm going to write them in the matrix form so that it is clear what the coefficient matrix is, what the unknowns are, and what the right-hand side vector is, and also it will help you to symbolically solve these two equations, two unknowns, whether you're going to use Cramer's rule, or whether you're going to use any kind of Gaussian elimination symbolically. So all of those things can be done, or you can just use your high school algebra to solve these two equations, two unknowns. In many of the books, you'll find out that many people drop this summation lower . . . these limits of summation from their books. The reason why they do that is because all the summations are the same, so far as the limits are concerned, but I try to keep them, because I don't want you to lose focus of that this is, let's suppose you're looking at this quantity, that this is a summation of the xis, all the xis are being added together, that's what that stands for. So I keep the summations, but if you are looking at a book, please don't think that I'm doing something different, I'm just doing something which is more complete, showing you all the limits of the . . . limits of the summations. So once you solve these two equations, two unknowns, which are written in the matrix form, this is what you're going to get, you're going to get a1 is equal to n times summation, i is equal to 1 to n, xi yi, where n is simply the number of data points which you have, summation, i is equal to 1 to n, xi, summation, i is equal to 1 to n, yi, and then you're going to divide by n times summation, xi squared, i is equal to 1 to n, minus summation, xi, i is equal to 1 to n, squared. And then a0 will be . . . you will get it to be summation, yi, i is equal to 1 to n, divided by n, minus a1, summation i equal to 1 to n, xi, divided by n. So once you have found a1, you can find a0 by simply by this simple formula here, but what is this? This is nothing but the average value of the ys, so you get y-bar, minus a1, and this is nothing but the average value of the xs, you get x-bar. So that's how you will be able to find a0 and a1 by doing linear regression. Now, one of the things which you've got to understand where people make a lot of mistakes is this and this. If you look at this expression where you are summing the square of the numbers, square of the x values, and here, where you are squaring the summation of the x values, so these two are two different quantities. This is where you have to square each x value first, and then add them all up, this one is where you add all the values of x, and whatever you as the summation, then you square it. So that's something to be thought about when you are applying these formulas to develop the linear regression formula of y is equal to a0, plus a1 x. And that's the end of this segment. |