CHAPTER 06.04: NONLINEAR REGRESSION: Polynomial Model: Derivation: Part 1 of 2   In this segment, we're going to derive the formula for polynomial regression. So the statement of the problem is as follows, that you are given n data points, let's suppose, x1, y1, all the way up to xn, yn, so you're given these n data pairs, and what you want to do is you want to best fit, based on least squares assumption, you want to best fit y is equal to a0, plus a1 x, plus all the way up to am x raised to the power m to the data. And of course, m has to be less than . . . less than or equal to n minus 1, of course, because otherwise you won't be able to fit a polynomial to it.  So when m is exactly equal to n minus 1, then we know that this particular regression polynomial which you are trying to best fit will go through all the data points, and in that case, that will be interpolation then. So rather than showing you the . . . showing you the derivation for a general mth-order polynomial going through n data points, or best fitting n data points, what I'm going to do is I'm just going to take a second-order polynomial so as the keep the algebra simple, and then you can extend the same concept to any order polynomial, whether it's a third-order polynomial, fourth-order, or mth-order polynomial which you might want to do.  So let's suppose I redo . . . restate the problem.  So let's suppose somebody says, hey, given x1, y1, all the way up to xn, yn, so somebody's giving me n data points, somebody says, hey, best fit y is equal to a0, plus a1 x, plus a2 x squared to the data.  Now, of course, I'm assuming that n is greater than or equal to . . . greater than or equal to 3.  So the number of data points which are given to me should be at least 3 in order to do the best fit, and of course, when n is exactly equal to 3, the second-order polynomial will go through the three data points, so n has to be . . . n has to be greater than or equal to 3. So let's go ahead and see that how we go about doing this regression here. So if I look at it from a graphical point of view, what somebody is doing is somebody's giving me n data points, so let's suppose somebody's giving me here, and what they want me to do is they want me to regress a second-order polynomial through those n data points.  So let's go ahead and find out how we can find this.  So we go back to the definition of the sum of the square of the residuals.  The sum of the square of the residuals means that what is that difference between the observed value and the predicted value.  So if you look at . . . at the predicted value, which is yi, at a particular value of xi which is given to me, the predicted value is yi, and the observed value will be whatever I get from the polynomial at that particular point, xi, so it'll be a1 xi, plus a2 xi squared, that's what I will get there.  So this is the observed value, this is the predicted value, and I'm going to . . . so that gives me the residual, so that's the difference, the difference between the observed value and the predicted value at a particular point is the residual at that particular point, and I'm going to square the residual, and then I'm going to take all those residuals, and I'm going to add them up for all the n data points which are given to me, that's how I'm going to calculate my sum of the square of the residuals.  So I'm going to expand it a little bit, so i is equal to 1 to n, I get yi, minus a0, minus a1 xi, minus a2 xi squared, whole squared. So that's what I get as the sum of the square of the residuals.  Now, what I want to be able to do is I want to be able to minimize this summation to as small a number as possible. Of course, I know that the number is positive, because I'm squaring all the residuals, and I want to make it as small as possible.  So I'm going to use my differential calculus knowledge to do that, and I know that, and I know that the things which I can change in order to make this summation to be as small as possible is to, I can change a0, I can change a1, I can change a2, or these are the choices which I have to be able to minimize this summation. So what I'm going to do is I'm going to take the derivative of the sum of the square of the residuals with respect to a0, put that equal to 0, that the partial derivative of the sum of the square of the residuals with respect to a1, put that equal to 0, take the partial derivative with respect to a2, and put that equal to 0, and what's going to happen is that these three equations now are going to give me three equations and three unknowns.  So when they give me three equations and three unknowns, I should be able to solve them for a0, a1, and a2, and that's the whole idea about this.  Again, keep in mind this only gives me a local maximum or minimum.  I'm not going to show you the proof for why this will, whatever the values of a0, a1, and a2 will turn out to be, that they correspond local minimum, so that it is minimizing actually this value of Sr, that's beyond the scope of this particular course, but anyway, so let's go ahead and set up these three equations.  So I've got del Sr by del a0 will be summation, i is equal to 1 to n, so this is this squared, so that'll be 2 times yi, minus a0, minus a1 xi, minus a2 xi squared, times minus 1, because that'll be . . . it's 2 times . . . 2 times whatever is in here, times the derivative of this with respect to a0, which is simply -1, and that is equal to 0, that's because that's from this equation.  So same here, I'll take the partial derivative with respect to a1, I get summation, i is equal to 1 to n, 2 times yi, minus a0, minus a1 xi, minus a2 xi squared, times the derivative of this quantity, derivative of this quantity with respect to a1, which is simply minus xi, and that I need to put equal to 0.  And then the last equation will result from taking the partial derivative of the sum of the square of the residuals with respect to a2, and that'll give summation of, i is equal to 1 to n, 2 times yi, minus a0, minus a1 xi, minus a2, xi squared, times minus xi squared equal to 0. So that's, because again, it is 2 times whatever is in the parentheses there, times the derivative of this quantity here with respect to a2, which is simply minus, minus here, and then with respect to a2 is just minus xi squared. So what I need to do now is to expand these summations . . . not expand these summations, but to write individual summations so that I can set this up in the equation form.  So let me go ahead and do that.