CHAPTER 06.05: ADEQUACY OF REGRESSION MODELS: Check Two: Standard Error of Estimate
In this segment, we're going to talk about the second check for checking whether a particular regression model is adequate. In the first segment, we talked about... or in the first check, we talked about how to plot the data, and then plot the corresponding regression model which we have. So in this case, we're going to talk about the standard error of estimate. What is the standard error of estimate? It is very similar to how you calculate your standard deviation. So standard deviation is for simply data which is given to you, while as the standard error of estimate is for when you have now y versus x data given to you. So what we have is that the standard error of estimate is given by calculating the sum of the square of the residuals, and you divide it by n minus 2, and then you take the square root. If you remember your standard deviation was simply square root of St, divided by n minus 1. And the reason why we had n minus 1 is because you are already losing one of the degrees of freedom by calculating the average value, because this St requires you to have the difference between the observed values and the average value, and that's why it gets divided by n minus 1. So in the case of the standard estimate... error of estimate, there's a similar thing here, now that you have sum of the square of the residuals, which is the difference between the observed and the predicted values, and you're squaring each one of them, and that's what is shown here, is that this is the observed value, this is the predicted value, so if I put this in brackets here, that's what I will get. So this is the observed value, and this is the predicted value. So what we are doing in order to calculate the sum of the square of the residuals is simply subtracting the observed value from the predicted value from the regression model, squaring the difference, then adding all of them, which is simply here, we're adding all of them together, and then what we are doing is dividing by n minus 2. Now, why is this n minus 2 here? It is because we have two degrees of freedom which we are losing because of the calculation of a0 and a1, and that's how the standard error of estimate is calculated. Now, let's go ahead and see that how... what this means so far as the example is concerned, how we're going to do there. So in the example, what we are doing now is that our regression model is already here, which we have found by using the formulas for the straight line regression model, this is the data which is given to us, so this is the data which is given to us, the observed data which is given to use. And now what we're going to do is we're going to use... use the values of Ti which I have to calculate my predicted values. So these are my predicted values based on this particular model right here, a0 is 0.26393, that's a0. So that's a0, and this one here is your a1. So by putting in the values of Ti, so for example, if I put Ti equal to -340 into this particular formula right here, that's what I'm going to get, I'm going to get 2.7357 as my predicted value. So 2.45 is my observed value, 2.735 is my predicted value, the difference between the two is my residual at that particular point, so this is the residual at that particular point. So what you are basically doing is that you are calculating these residuals at these six data points which have been chosen by you. So by calculating those residuals, now all I have to do is I'm going to square those residuals, so all I have to do is, let me go back to this slide here, I'm going to square this number, square this number, square this number, square this number, square this number, and square this number, and once I have squared all those six numbers there, what I'm going to do is I'm going to add them up, and this is what turns out to be the sum of the square of the residuals. Now, in order to calculate my standard error of estimate, where this is shown as the dependent variable, and T is the independent, variable, I'm going to just divide by n minus 2, since there are six... six data points, and two is the number of degrees of freedom which I have lost because of using the straight line regression, the intercept and the slope, those are the two degrees of freedom, so I'm putting Sr here, and this is what I get as the value of the standard estimate... standard error of estimate. Now the question arises that what does this mean? This standard error of estimate which we are calculating, what does this mean in plain terms? Now, what we have done here is that we have taken the six data points which are given to us, one, two, three, four, five, and six, we're taking the six data points, this red line which you are seeing right here is your regression curve, so that is your regression model. So that is your regression model, and what we have done then is that we have... we are basically saying that this value of this standard estimate of error which we have is 0.25141, is on the average, the difference between the observed and the predicted values. So if we are looking at this observed and the predicted value, we are finding out that, hey, this is the average difference between the observed and the predicted values, and we can also, what we have done here is that we have put plus standard error of estimate, and plus... minus standard error of estimate here, those are the two red lines which you are seeing here. So what we are basically saying is that 95 percent of the... this should be 2 times, so 2 times this number here is being drawn by this line here and 2 times by that line. So what we are basically saying is that between plus minus 2 S-alpha-T, which is the standard error of estimate, that we're going to find out that 95 percent of the values will be between those two... between this plus minus range there. So what we're going to basically say is that the value of alpha is expected to be accurate between this particular range, which means that it's plus minus 2 times 0.25141, which gives us a number of 0.... plus minus 0.50282. So it basically tells us that the value of alpha is expected to be accurate within this plus minus range of 0.50282 when we are talking about these residuals here. Now, there's another way of looking at the residual issue, or the standard error of estimate issue, the way we can do it is we can calculate something called a scaled residual. So a scaled residual is simply you take the residual, and you divide by the standard error of estimate which we just calculated. So this is what the residual is, because that's what? That's the observed value... that's the observed value. So that is the observed value minus the predicted value, because this is the observed value, right? This is the observed value, and this is the... this is the predicted value. So you can very well see that we are calculating these residuals at each data point by subtracting the observed part minus the predicted part, and divided it by the standard error of estimate, and then what we're going to say is that we are going to expect that 95 percent of the scaled residuals are going to be between -2 and +2. It goes back to the thing which when we talked about that we are expecting that alpha to be accurate within plus minus 2 times the standard error of estimate. So this is just another way of looking at the same thing which we are talking about. So going back to figuring out how we're going to calculate the scaled residuals, we already have calculated what the standard error of estimate is, and the residuals have been calculated in the previous slide, where it is the difference between the observed values and the predicted values. So that's what we are calculating, the residual between the observed and the predicted values, and then the scaled residual is simply dividing this number here... dividing this number here by this scaled residual. So for example, if you are looking at this scaled residual right here, that is nothing but -0.28571... that is nothing but 0.25871, divided by, minus, divided by 0.25141, so that's what's giving us this scaled residual of -1.1364. So similarly, you are calculating all the other scaled residuals, this is the first scaled residual, second, third, fourth, fifth, and sixth out of the six data points, and what you are finding out is that all these six scaled residuals are all between the range of -2 to +2. So you don't have to even count that whether 95 percent of the scaled residuals are between -2 and +2, which they are, so hence, this shows that the scaled residuals are in the proper range. And that's the end of this segment. |