CHAPTER 06.04: NONLINEAR REGRESSION: Exponential Model: Transformed Data: Derivation
In this segment, we're going to talk about the exponential model of regression, and see that how can we find out the constants of an exponential regression model? But approach which we're going to take here is the transformed data approach, where you are transforming that data so that it can be regressed by a linear regression model. So let's go ahead and see what that all means, rather than pointing it out right in the beginning. So what we want to do is we want to . . . somebody is giving us n data points, so let's suppose somebody's giving me x1, y1, all the way up to xn, yn, and says best fit, so again, we are talking about best fit in the least squares sense, best fit y is equal to a e to the power b x to the data. So somebody's taking n data points, and asking us to best fit y is equal to a e to the power b x to the data, but what we want to do is we want to use the transformed data. The reason why we want to use the transformed data is because if I sum the square of the residuals for this particular quantity, so if I do this, that the sum of the square of the residuals for this particular model will look like this, yi, minus a e to the power b xi, whole squared. In another segment, I showed that once you tried to minimize the sum of the square of the residuals here with respect to a and b, because that's how you want to find out a and b by using the least squares method, it turns out to be that you have to solve a nonlinear equation to do that. So in order to circumvent that step, so I'm not telling you that it basically solves the eventual problem, statistically it's not an optimal thing to do, what I'm going to show you now, but it does make it simpler to find out what these coefficients, a and b, or these constants of the model, a and b, are. So be sure that when saying that, hey, you can transform the data in order to be able to do regression, you are not doing a statistically optimal way of finding out what those constants of the model are. So let's go ahead and see what this transformed data business means. y is equal to a e to the power b x, and I know that if I use this approach to find out the derivative . . . the values of a and b by taking the derivative, partial derivatives of Sr with respect to a and b, I'm going to get a nonlinear equation. What I do instead is I'm going to say, hey, let y is equal to a e to the power b x, and let me take the log of both sides. So this is where the transformation is occurring. Keep in mind the transformation is not occurring in the model, but in the data. You are transforming the data, not the model, and I'll make it clear in a little bit. You take log of y on both sides, you get log of y is equal to log of a e to the power b x, and you get log of y, and use the log of a times b formula, that gives you log of a, plus log of e to the power b x, and then you will get log of y is equal to log of a, plus b x, this is what you're going to get. This is just by taking log of both sides of that particular model here. So let me write this down again here. So we have log of y is equal to log of a, plus b x. So what that means is that if I choose this to be, let's suppose z, let me choose this to be a0, let me choose this to be a1, and then I can write down z is equal to a0, plus a1 x. So I say that, hey, log of y is z, log of a is a0, b, let's suppose is a1, and x stays as x, I get z is equal to a0, plus a1 x. And what you are finding out that z versus x is linear. So what you are finding out is z versus x is a linear model. And so what that means is that you can now use the simple relationships for a0 and a1 to find out what . . . what a0 and a1 are, by transforming your data from y versus x to z versus x. Now how do you transform the data from y versus x to z versus x? That's simply by taking the log of . . . log of the y values, that's what you're going to do. So you're going to take the log of . . . so zis, so you have zi will be equal to log of yi. So that's what you're going to find out, how you'll be able to do a transformation, so you'll have to do z is equal to log of y. So when you're trying to find out your zi values, because your regression model is linear between z and x, that's what you're going to do, you're going to take the log of the yi values, and find out your zi values. So what that means is that once you have converted your y versus x data to z versus x data, you can now use the linear regression formulas for a0 and a1. So use linear regression . . . use linear regression formulas for a0 and a1. So those are very simple formulas, which will be in terms of zis and xis. So once you have found the linear regression formulas for a0 and a1, how do I get my original constants for the model? It is by going back here to the first, where we did the transformation. So log of a is a0, right? log of a is a0, so that simply implies that a is nothing but e to the power a0, and I know that b is same as a1, because I just used it for substitution, so b is same as a1. So what we are basically saying again here is that we're going to convert our y versus x model to a linear regression model of z versus x. Once . . . the way we're going to find, we already know what our x values are, the way we're going to find our z values is by taking the log of the y values which are given to us, so we take the log of each of those n values which are given to us. We'll be able to find a0 and a1 by using the linear regression formula, and once we have used the linear regression formula, we can find out the value of a by simply saying a is equal to e to the power a0, and b is equal to a1, which gives us the values of the constants of this model, which is the exponential model, y is equal to a e to the power b x. So keep in mind that the transformation is done to the data, not to the model, you're not doing any kind of something called linearizing the model, or anything like that. Your model is still an exponential model, but the way you are doing this for mathematical convenience is converting your y versus x . . . from y versus x exponential model to z versus x linear regression model, and once you have the linear regression model constants, you calculate the constants of the exponential model by doing the . . . by using whatever was getting transformed, and you will back have your a and b. Again, keep in mind that whatever value a and b you're going to get by doing this transformed data method is not statistically optimal, because you are not doing the least squares on the original y and x, you are doing the least squares between z and x, that's what you are doing. Your least squares is between z and x. What does that mean? That means as follows, that what you are doing is that your sum of the square of the residuals which you are trying to minimize is z, minus a0, minus a1 times xi, squared, that's what you are minimizing, i is equal to 1 to n. And what you are basically minimizing is i is equal to 1 to n, zi is log of yi, a0 is nothing but . . . a0 is nothing but log of a, a1 is nothing but b, and xi. So this is what you are trying to minimize, not, you're not trying to minimize this, summation, yi, minus a e to the power b xi squared, this is what you should have been minimizing, but you are minimizing this to be able to find out a and b. The only reason why you are minimizing this is because of mathematical convenience, and hence, that's why it turns out to be not statistically optimal, as would be this case, which is not mathematically convenient, but in today's world, you can find out always values of a and b by solving a nonlinear equation. And that's the end of this segment. |