CHAPTER 06.03: LINEAR REGRESSION: Choosing
Criterion for Linear Regression In this segment we will talk about how do we choose a criterion for linear regression. So in regression what we are trying to do is that if
somebody gives us n data points, all we're trying to do is to say - hey how
can I best fit a particular curve to it? So here it is illustrated in the example here
or in the figure here, that this is the regression curve
and these are the data points which are given to us. So
one of the ways to look at it is that - hey if I have the residual at each
data point, which means that - hey what is the difference between the
observed value and the particular value? This is the observed value at any
point X sub i and this will be what we predicted at
a particular point. So this
is the predicted value of y at this particular point. And this is something
which is observed or given to us. So the distance
between - the vertical distance between the two is something which is called
the residual. So one would expect that we are trying
to best fit the data. We would like to make these residuals as small as
possible. So let's concentrate on a linear
regression model, because that's what we are trying to do in this particular
place. And we want to see that - hey if we are going to make a residual to be
small, then other residuals might become large. So intuitively, one might think that hey -
rather then trying to make one residual to be small
and then making another residual small - it's not possible because we'll be
making one small and one large. What might be a good criterion to use is to
take the sum of the residuals and make it as minimum as possible or as small
as possible. So let's go and see that whether how we
can go about doing that. So I'm going to take an
example, here we have four data points given to us and one say okay here let's
best fit data to a straight line using this criteria of minimizing the sum of
the residuals. We are given the data points in a tabular form here, we're
given the data points in a plot here. Now what we're going to do is I'm
taking this straight line y is equal 4x minus 4
which is right here, and I'm going to say hey let
that be the regression curve. I'm not saying that hey I know that it is the
regression curve, but let me make a good guess and
say ok here let that be the regression curve. And what I want to do is in
order to figure out a whether it meets the good criteria of summing the
residuals and making them as small as possible the way I can do that is by
calculating the residuals at each data points. So
these are the data values which are already given to me. Now what I have to do is I have to figure out hey what is the
predicted value at 2. So the predicted value at 2 is
4, right from the graph you can see it. The predicted value at 3 is 8 right
there as given. The predicted value of this 2 is still 4 because that's what
the point is on the straight line and the predicted value of 3 is still eight
which is because that's what's given on the straight line. So
these are the predicted values and now what I'm going to do is I'm going to
calculate my residuals which are the observed value minus the particular.
These are the observed values and these are the
predicted values. So I'm going to subtract the two.
I get 0, minus 2, 2, and 0. And when I start adding all of these residuals
what do I get 0, minus 2, plus 2,plus 0 is 0. So this seems to be that hey that was probably a good
equal that I chose a good curve by chance. So we
chose Y is equal to 4 X minus 4 as a regression curve, so what if I wanted to
choose Y is equal to 6 as my regression curve and see that whether I get
similar sum of the residuals number. So in this case
this is the data which is given to us and these are the predicted values of y
is equal to 6. That's pretty straight forward
because we have our equation to be y equals 6 for the regression curve, so
all the predicted values will be 6 at any data point. And now I'm going to
calculate my residuals which are observed minus predicted and in this case is
still 0, minus 2, 0, 0, and plus 2. But when I sum the residuals what do I
get? I still get zero. So
what does it tell me that I have at least two different straight lines while
I'm using the criteria of minimizing the sum of the residuals and I'm getting
the sum of the residuals to be 0. So one can very
well say that hey for both of these regression models, I'm getting sum of the
residuals, to be equal to zero. So the sum of the
residuals is minimized and in this case it is 0. But the regression model is
not unique I cannot make it any smaller than 0 - the residuals. So what we are finding out is that the hey the regression
model but the traditional model is not unique. So
what that makes this particular criterion to be is not to be a good criterion
because we're not getting a unique line so let's see what we can do if we can
do any better than that. So one might realize that
hey the reason why I was getting the sum of the residuals to be 0 is because
I get a negative residual and a positive residual for this particular model
or for y is equal to 6 model. And that's why I'm getting the sum of the
residuals to be 0. So maybe what I should be doing is rather than taking the
sum of the residuals to be a criterion, maybe we should have a criterion
which will minimize not the sum of the residuals but minimize the sum of the
absolute value of the residuals. So let's see
whether that one could give us any better criteria or not. So
we are going to take the exact same example with the same data and we want to
say hey let's go and use minimizing the sum of the absolute value of the
residuals. So we have exactly the same table and the
same figure as we have for the previous one, and we want to again choose y is
equal to 4 X minus 4 as the regression curve which explains this data in this
best fit. This is the data which is given to us this is the predicted values
which we've done in the previous example. And now what we're doing is again
we're calculating the residuals and the residuals the first point is 0 then
minus 2 then 2 and then zero. Now when we want to sum the absolute value of
the residuals, I'm going to get a value of 4. Now again for y is equal to 6
let's see what happens there. In that particular case
also what I'm going to find out is that hey I'm going to get minus two zero zero and two as my residuals because that's what I
calculated in the previous example. The only difference is that I'm
calculating this sum of the absolute value of the residuals. In this case I
get to be four. So what I'm finding out is that for
two different regression models at least two different regression models,
that I am getting the sum of the residuals, the absolute value of the
residuals to be 4. But again the regression model is
not unique. So you might say hey if you if you have the sum of the residuals
which is this, the sum of the absolute residuals is 4, is there a possibility
that which you have not looked at is that the sum of the absolute residuals
will be strictly less than 4. In fact if you will go and take this data which
we have right now for this example, you will be unable to find a straight
line for which the sum of the absolute value of the residuals will be less than 4.
So what we are finding out in fact this is the - this is the minimum sum of
the absolute residuals you will get
and you have at least two different
straight lines for which you gain the
same - the same value same minimum
value. so the regression model is not unique, if the regression model is not unique then
the sum of the absolute value
residuals is also a bad criterion. so
we already talked about that hey can we use some of
the residuals
to be a criterion. we have already talked about can we use some of the absolute value of the residuals to be
a criterion and it seems that both of
them are not good criteria to use, so
what we find out is the better
criteria to use is that we take the
square of each of the residuals so
this is the residual itself, you take
the square of each residual you add
them all up which is called the sum of
the square of the residuals. so that's why we say sum of the square of the
residuals. You take each residual square it, then you sum it and you
will like minimize this as opposed to minimizing some of the residuals
or minimizing the sum of the absolute
value the residuals. And how do we go about finding the minimum of this particular sum of the square of the residuals we'll do
that in the next lesson. |