CHAPTER 06.04: NONLINEAR REGRESSION: Saturation Growth Model: Transformed Data: Derivation     In this segment, we're going to talk about how to derive the formula for a saturation growth model. And we're going to look at, again, the approach we're going to take to find the saturation growth model constants is by using transformed data approach.  So we'll talk about that in a little bit, what that means.  Let's go ahead and state the problem.  So we are given x1, y1, all the way up to xn, yn, and somebody says best fit, again, in the least squares sense, best fit y is equal to a x, divided by b, plus x to the data. Now, this best fit for this saturation growth model, the reason why this is called saturation is because if you take the limit of as x approaches infinity, this is a x, divided by b, plus x will be equal to x, divided by b by a . . . no, divide by x both . . . if I divide by x, both the numerator and the denominator, I get a, divided by b by x, plus 1, limit of x approaching infinity, and I get just a.  So what this means is that as x becomes a very large number, it saturates to a, and that's why it's called saturation growth model, but it starts with x equal to . . . y equal to 0 also, so at x equal to 0 . . . at x equal to 0, your y is 0, at x equal to infinity, y is equal to a.  So you start from 0, and you go up to a.  So if you were going to look at it from a graphical point of view, your data most probably will look like something like this for the saturation growth model.  So if this is the data which is given to you, and you are trying to develop a saturation growth model, and it will simply start to become asymptotically equal to a. So this particular value will be a here, so that's there, but it's going to start from . . . it's going to start from 0 itself right here.  So let's go ahead and see that how can we find out the constants of the model, which are a and b in this case, so a and b are the two constants which we need to find for the saturation growth model. So we are going to use a transformed data approach, which basically implies that we're going to use our knowledge of linear regression to find out what a and b are.  Keep in mind that this is still a nonlinear model, but we are transforming the data only for the sake of convenience. So the way we're going to do it is we're going to say, hey, if y is equal to a x divided by b plus x, then 1 divided by y is b plus x, divided by a x, and that gives me b divided by a, 1 divided by x, plus 1 divided by a, so that's what it turns out to be equal to.  The reason why I've made this change is because if I treat this to be z, I treat this to be a1, I treat this to be w, and I treat this to be a0, then what I have is that I have z is equal to a0, plus a1 w. So what you are finding out here is that by making this transformation, I'm able to say that, hey, z versus w is a linear model.  Then I can use a linear regression model between z and w, not between y and x.  Between y and x, we still have a saturation growth model going on.  So this is only for mathematical convenience, we're saying that, hey, if I convert my y values to z values by simply by taking 1 divided by yi values, similarly I'm going to get my w values by taking 1 divided by x values, I can say that, hey, z versus w is linear, and the reason why I want to do that is because there are simple equations written for a0 and a1 for a linear regression model.  So once I have found a0 and a1, I can find out then my b and a.  So how do I get my z and w values?  Because zis will be nothing but 1 divided by yis, so all I have to do is to take my y values and just take the inverse of those numbers and I will find the z values, and in order to find out the w values, all I have to do is to take all my x values, and I will be able to find out all my w values.  That will give me a linear regression model between z and . . . z versus w, I'll be able to find a0 and a1, then, but these are not the constants of my original model.  The constants of my original model are a and b, but I can find those by this, these two assumptions which I made that b by a is a1 . . . not assumptions, but substitutions I made, b by a is same as a1, and 1 by a is same as a0.  So I know that 1 by a is same as a0, that tells me that a is nothing but 1 divided by a0.  So once I find out my a0, all I have to take is the inverse of that, and I will be able to find the value of a in there, and b by a is a1, that implies that b is equal to a1 times a, and what is a?  a is 1 by a0, so a1 divided by a0. So that'll give me the value of b.  So the . . . the procedure here is to invert your y values to get your z values, invert your x values to get your w values, do the linear regression of z versus w, you will be able to find out what a0 and a1 are, and once you have found a0 and a1, you can find your constants of the original model a by inverting a, and then b is simply a1 divided by a0, and that's how you're going to do the saturation growth model. Again, keep in mind is that you are doing the transformation of the data for convenience purposes, not for the purpose of . . . you're not actually finding the least squares between the observed values and the predicted values here, but you're finding the least squares between the observed and predicted values of z . . . between the z values, so that's what you are doing there, so keep in mind that there's a difference between the two.  So let me go ahead and explain what that difference is.  So if you look at . . . you have z is equal to a0, plus a1 w, what you are basically doing is that you're minimizing the sum of the square of the residuals of this, you are minimizing zi, minus a0, minus a1 wi, squared, i is equal to 1 to n, and that is summation, i is equal to 1 to n, zi is 1 divided by yi, a0 is same as 1 by a, a1 is same as b by a, and wi is nothing but 1 divided by xi, squared, so that's what you are minimizing when you are doing this transformed data business. What you should have been minimizing, if you wanted to do statistically optimal regression, you should have been doing this. You should have been minimizing this, but since this is mathematically convenient, that's why we are taking this approach, as opposed to this approach, which can be also done.  And that's the end of this segment.