CHAPTER 06.03: LINEAR REGRESSION: Choosing Criterion for Linear Regression

In this segment we will talk about how do we choose a criterion for linear regression. So in regression what we are trying to do is that if somebody gives us n data points, all we're trying to do is to say - hey how can I best fit a particular curve to it?

 

So here it is illustrated in the example here or in the figure here, that this is the regression curve and these are the data points which are given to us. So one of the ways to look at it is that - hey if I have the residual at each data point, which means that - hey what is the difference between the observed value and the particular value? This is the observed value at any point X sub i and this will be what we predicted at a particular point. So this is the predicted value of y at this particular point. And this is something which is observed or given to us. So the distance between - the vertical distance between the two is something which is called the residual. So one would expect that we are trying to best fit the data. We would like to make these residuals as small as possible. So let's concentrate on a linear regression model, because that's what we are trying to do in this particular place. And we want to see that - hey if we are going to make a residual to be small, then other residuals might become large.

 

So intuitively, one might think that hey - rather then trying to make one residual to be small and then making another residual small - it's not possible because we'll be making one small and one large. What might be a good criterion to use is to take the sum of the residuals and make it as minimum as possible or as small as possible. So let's go and see that whether how we can go about doing that. So I'm going to take an example, here we have four data points given to us and one say okay here let's best fit data to a straight line using this criteria of minimizing the sum of the residuals. We are given the data points in a tabular form here, we're given the data points in a plot here. Now what we're going to do is I'm taking this straight line y is equal 4x minus 4 which is right here, and I'm going to say hey let that be the regression curve. I'm not saying that hey I know that it is the regression curve, but let me make a good guess and say ok here let that be the regression curve. And what I want to do is in order to figure out a whether it meets the good criteria of summing the residuals and making them as small as possible the way I can do that is by calculating the residuals at each data points. So these are the data values which are already given to me. Now what I have to do is I have to figure out hey what is the predicted value at 2. So the predicted value at 2 is 4, right from the graph you can see it. The predicted value at 3 is 8 right there as given. The predicted value of this 2 is still 4 because that's what the point is on the straight line and the predicted value of 3 is still eight which is because that's what's given on the straight line. So these are the predicted values and now what I'm going to do is I'm going to calculate my residuals which are the observed value minus the particular. These are the observed values and these are the predicted values. So I'm going to subtract the two. I get 0, minus 2, 2, and 0. And when I start adding all of these residuals what do I get 0, minus 2, plus 2,plus 0 is 0. So this seems to be that hey that was probably a good equal that I chose a good curve by chance. So we chose Y is equal to 4 X minus 4 as a regression curve, so what if I wanted to choose Y is equal to 6 as my regression curve and see that whether I get similar sum of the residuals number. So in this case this is the data which is given to us and these are the predicted values of y is equal to 6. That's pretty straight forward because we have our equation to be y equals 6 for the regression curve, so all the predicted values will be 6 at any data point. And now I'm going to calculate my residuals which are observed minus predicted and in this case is still 0, minus 2, 0, 0, and plus 2. But when I sum the residuals what do I get? I still get zero.

 

So what does it tell me that I have at least two different straight lines while I'm using the criteria of minimizing the sum of the residuals and I'm getting the sum of the residuals to be 0. So one can very well say that hey for both of these regression models, I'm getting sum of the residuals, to be equal to zero. So the sum of the residuals is minimized and in this case it is 0. But the regression model is not unique I cannot make it any smaller than 0 - the residuals. So what we are finding out is that the hey the regression model but the traditional model is not unique. So what that makes this particular criterion to be is not to be a good criterion because we're not getting a unique line so let's see what we can do if we can do any better than that. So one might realize that hey the reason why I was getting the sum of the residuals to be 0 is because I get a negative residual and a positive residual for this particular model or for y is equal to 6 model. And that's why I'm getting the sum of the residuals to be 0. So maybe what I should be doing is rather than taking the sum of the residuals to be a criterion, maybe we should have a criterion which will minimize not the sum of the residuals but minimize the sum of the absolute value of the residuals. So let's see whether that one could give us any better criteria or not. So we are going to take the exact same example with the same data and we want to say hey let's go and use minimizing the sum of the absolute value of the residuals. So we have exactly the same table and the same figure as we have for the previous one, and we want to again choose y is equal to 4 X minus 4 as the regression curve which explains this data in this best fit. This is the data which is given to us this is the predicted values which we've done in the previous example. And now what we're doing is again we're calculating the residuals and the residuals the first point is 0 then minus 2 then 2 and then zero.

 

Now when we want to sum the absolute value of the residuals, I'm going to get a value of 4. Now again for y is equal to 6 let's see what happens there. In that particular case also what I'm going to find out is that hey I'm going to get minus two zero zero and two as my residuals because that's what I calculated in the previous example. The only difference is that I'm calculating this sum of the absolute value of the residuals. In this case I get to be four. So what I'm finding out is that for two different regression models at least two different regression models, that I am getting the sum of the residuals, the absolute value of the residuals to be 4. But again the regression model is not unique. So you might say hey if you if you have the sum of the residuals which is this, the sum of the absolute residuals is 4, is there a possibility that which you have not looked at is that the sum of the absolute residuals will be strictly less than 4. In fact if you will go and take this data which we have right now for this example, you will be unable to find a straight line for which the sum of the absolute value of  the residuals will be less than 4. So what we are finding out in fact this  is the - this is the minimum sum of the  absolute residuals you will get and you  have at least two different straight  lines for which you gain the same - the same  value same minimum value. so the regression model is not unique, if the  regression model is not unique then the sum  of the absolute value residuals is also  a bad criterion. so we already talked about that hey can we use some of the  residuals to be a criterion. we have already talked about can we use some of the  absolute value of the residuals to be a  criterion and it seems that both of them are  not good criteria to use, so what we find   out is the better criteria to use is that  we take the square of each of the  residuals so this is the residual itself,  you take the square of each residual you  add them all up which is called the sum  of the square of the residuals. so that's  why we say sum of the square of the residuals. You take each residual  square it, then you sum it and you will like minimize this as opposed to minimizing some of the residuals or  minimizing the sum of the absolute value the residuals. And how do we go about finding the minimum of this particular sum of the square of the residuals we'll do that in the next lesson.