Adequacy of Regression Models\302\251 2007 Autar Kaw, Jamie TrahanUniversity of South FloridaUnited States of Americakaw@eng.usf.eduIntroductionThis worksheet allows you to determine whether a straight-line regression model adequately describes n data points (x1,y1), (x2,y2), (x3,y3),.....,(xn,yn). Four different checks are used to find if the model is adequate. Our discussion, although limited to straight line models, is applicable to any regression model.Adequacy Check #1: Plot of straight-line regression model vs. data to visually inspect how well the data fits the line.Adequacy Check #2: Calculation of the coefficient of determination, r2. This value quantifies the percentage of the original uncertainty in the data that is explained by the straight line model. Adequacy Check #3: Determine if the residuals as a function of x show nonlinearity. This is an indication that the model is not adequate.Adequacy Check #4: Determine if 95% of the values of scaled residuals are within [-2,2]. If so, this is an indication that the model may be adequate.Please note that the above checks are not a complete test of the adequacy of regression models. Other tests include testing if indeed y is dependent on x, confidence intervals of the constants of the model, etc.To learn more about the quality of a fitted linear regression model, see the worksheet on the adequacy of regression models.LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=Section 1: Input DataBelow are the input parameters to begin the simulation. This is the only section that requires user input.Input Parameters:X = array of x valuesY = array of y valuesn = number of data pointsNOTE: The user has the option of choosing his or her own X and Y array (e.g. X := (1,7,13,19,25), Y: = (1,49,169,361,625)). We are instead showing a large data set generated within a loop to better illustrate model adequacy.restart;
X :=array(1..50):
for i from 1 by 1 to 50 do
X[i]:=evalf(i/2):
end do:
print(X);Y:=array(1..50):
for i from 1 by 1 to 50 do
Y[i]:=evalf((X[i])^2);
end do:
print(Y);LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=n:=50;JSFHSection 2: Finding the straight-line modelWe will use Maple's Fit command to fit the data to a straight line.JSFHwith(Statistics):
y:=Fit(a*x+b,X,Y,x):
f:=unapply(y,x):
print(`The straight-line regression model is y = `, y );LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=Section 3: Checking for adequacy of the modelAdequacy Check #1Below, the linear regression model is plotted versus data points. See if the straight-line regression model visually explains the data.observed:=[seq([X[i],Y[i]],i=1..n)]:
predicted:=f(x):
plot([observed,predicted],x=X[1]..X[n],style=[POINT,LINE],labels=["x","y"],symbol=CIRCLE,symbolsize=20,title="y vs. x",legend=["Data Points","Predicted Curve"]);LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=Adequacy Check #2In this section we will calculate the coefficient of determination, r2.r2 = St - Sr StwhereSr = the sum of the squares of the residuals (a value that quantifies the spread around the regression line)andSt = the sum of the squares of deviation from the mean (a value that measures the spread between the data and its mean)This value describes the proportion of variation in the response data that is explained by the regression model. When all the points in a data set lie on the regression model, the largest possible value of r2= 1 is obtained, while a minimum possible value of r2=0 is obtained when there is only one data point or if the straight line regression model is a constant line. Note: Please see the Adequacy of Models worksheet for limitations in the use of r2.Calculation of r2:Sum of the difference between observed values and average values, St:with(Statistics):
St:=0:
for i from 1 by 1 to n do
St:=St+(Y[i]-Mean(Y))^2;
end do:
St;Sum of the square of the residuals, Sr:Sr:=0:
for i from 1 by 1 to n do
Sr:=Sr+(Y[i]-f(X[i]))^2;
end do:
Sr;Coefficient of determination, r2:r2:=(St-Sr)/St;Adequacy Check #3In this section, the residuals, which are the differences between the observed values and predicted values (yi - a0 -a1xi), are found and then plotted as a function of x to check for increasing variance, outliers, or nonlinearity.Calculating the residuals: residuals:=array(1..n):
for i from 1 by 1 to n do
residuals[i]:=Y[i]-f(X[i]);
end do:
print(`The residuals are `,residuals);Plotting the residuals.residual:=[seq([X[i],residuals[i]],i=1..n)]:
plot(residual,x=X[1]..X[n],style=point);Adequacy Check #4In this section, the scaled residuals SR (ratio between residual and the standard error of estimate) are calculated. SR is given bySR = y[i] - f(x[i]) sqrt(Sr/(n-m)where f(x) is the regression function, n is the number of data points and m is the number of degrees of freedom lost (i.e. the number of constants in the model. For a straight line model, m=2).Calculation of SR:m:=2:
SR:=array(1..n):
for i from 1 by 1 to n do
SR[i]:=residuals[i]/(sqrt(Sr/(n-m)));
end do:print(SR);Calculating the percent within range:count:=0:
for i from 1 by 1 to n do
if -2<=SR[i] and SR[i]<=2 then
count:=count+1;
end if:
end do:
if (count/n)>=0.95 then
print((count/n)*100,`% of SR values fall between -2 and 2, therefore at least 95% of SR values are in the [-2,2] range`);
else
print((count/n)*100, `% of SR values fall between -2 and 2, therefore at least 95% of SR values are not in the [-2,2] range`);
end if;Plotting the scaled residuals:scaledresid:=[seq([X[i],SR[i]],i=1..n)]:
plot([2,-2,scaledresid],x=X[1]..X[n],style=[line,line,point],color=[blue,blue,green],title="Scaled Residuals");LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=References[1] Autar Kaw, Holistic Numerical Methods Institute, http://numericalmethods.eng.usf.edu/mws, SeeAdequacy of Regression modelsHow does Linear Regression work?LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=ConclusionUsing Maple we are able to check for the adequacy of a linear regression model.Question 1: Given data
show if regressing the data to y = a0 +a1x is adequate.Question 2: Theoretical considerations assume that the rate of flow from a fire hose is proportional to some power of the nozzle pressure. However a scientist believes that the simpler linear regression model is adequate. Determine whether the linear model is adequate.
Question 3: Given (x1,y1), (x2,y2), (x3,y3),.....,(xn,yn) the coefficient of determination, r2 is zero if the straight line regression model turns out to be a constant line. Prove that the constant line is the average of the y-values, that is, LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYlLUkmbWZyYWNHRiQ2KC1GIzYuLUkjbW9HRiQ2LVEmJlN1bTtGJy8lLG1hdGh2YXJpYW50R1Enbm9ybWFsRicvJSZmZW5jZUdRJmZhbHNlRicvJSpzZXBhcmF0b3JHRjkvJSlzdHJldGNoeUdRJXRydWVGJy8lKnN5bW1ldHJpY0dGOS8lKGxhcmdlb3BHRj4vJS5tb3ZhYmxlbGltaXRzR0Y+LyUnYWNjZW50R0Y5LyUnbHNwYWNlR1EmMC4wZW1GJy8lJ3JzcGFjZUdRLDAuMTY2NjY2N2VtRictRjE2LVEifkYnRjRGN0Y6L0Y9RjlGPy9GQkY5L0ZERjlGRUZHL0ZLRkktSSNtaUdGJDYlUSJ5RicvJSdpdGFsaWNHRj4vRjVRJ2l0YWxpY0YnLUkobWZlbmNlZEdGJDYmLUYjNiQtRlU2JVEiaUYnRlhGWkY0RjQvJSVvcGVuR1EiW0YnLyUmY2xvc2VHUSJdRictRjE2LVEiLEYnRjRGNy9GO0Y+RlBGP0ZRRlJGRUZHL0ZLUSwwLjMzMzMzMzNlbUYnRk1GW28tRjE2LVEiPUYnRjRGN0Y6RlBGP0ZRRlJGRS9GSFEsMC4yNzc3Nzc4ZW1GJy9GS0ZecC1JI21uR0YkNiRRIjFGJ0Y0LUYxNi1RIy4uRidGNEY3RjpGUEY/RlFGUkZFL0ZIUSwwLjIyMjIyMjJlbUYnRlMtRlU2JVEibkYnRlhGWkY0LUYjNiRGaXBGNC8lLmxpbmV0aGlja25lc3NHUSIxRicvJStkZW5vbWFsaWduR1EnY2VudGVyRicvJSludW1hbGlnbkdGY3EvJSliZXZlbGxlZEdGOS1GMTYtUSIuRidGNEY3RjpGUEY/RlFGUkZFRkdGU0Y0LUklbXJvd0c2Iy9JK21vZHVsZW5hbWVHNiJJLFR5cGVzZXR0aW5nR0koX3N5c2xpYkdGJzYjLUkjbWlHRiQ2I1EhRic=Legal Notice: The copyright for this application is owned by the author(s). Neither Maplesoft nor the author are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the author for permission if you wish to use this application in for-profit activities.