(This is seventh article in the series *Using R for Gambet statistical analysis*.[1]) To this point, we have carried out basic statistical calculations with **R**. With some effort, we could perform the operations in spreadsheets or our own programs. In this article, we’ll turn to a much more sophisticated calculation that would be difficult to reproduce: nonlinear parameter fitting. Despite the complexity of the underlying calculations, it is simple to set up the **R** script.

Figure 1 illustrates the test calculation. Given a noisy signal, we want to determine if there is a resonance hidden within it. Such a response usually has the form of a Gaussian function

*A exp[-(x-mu)^2/(2.0*sigma^2) *[1]

We want to determine values of *A*, *mu* and *sigma* (the resonance width) such that the function gives the most probable fit to the data of Fig. 1. We also want to determine confidence levels for the fit. This is clearly a nonlinear problem because the parameters *mu* and *sigma* are integrated in the function.

The spreadsheet of Fig. 2 is used to generate data over the range 0.0 < *x* < 20.0. Values of *y* are determined by Eq. 1 with *A* = 20.0, *mu* = 8.58 and *sigma* = 1.06. The noise level is set as an adjustable parameter. Rather than save the spreadsheet as a *CSV* file, we copy the data section as shown and past into a text file file, *peakdata.tab*. In this case, the entries on a line are separated by tab characters instead of commas. Therefore, we use a different command form to read the file:

PeakData = read.table("peakdata.tab",header=T,sep="")

The header sets the column names of the data frame *PeakData* to *y* and *x*. The option *sep=””* means that any white-space characters (including tabs) act as delimiters.

To run the example, copy and paste this text into the **R** script editor:

PeakData = read.table("peakdata_high.tab",header=T,sep="") fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2))),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0)) summary(fit) plot(PeakData$x,PeakData$y) new = data.frame(x = seq(min(PeakData$x),max(PeakData$x),len=200)) lines(new$x,predict(fit,newdata=new)) confint(fit,level=0.95)

The critical command line for the nonlinear fit is

fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/Sigma^2)),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0))

The first argument of the *nls()* command is a defining formula:

y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2)))

It implies that the variable *y* depends on the independent variable *x* as well as parameters named *Amp*, *Mu* and *Sigma*. The dependence is declared explicitly and enclosed in the *I()* function to be safe. The second argument, *data=PeakData*, defines the input data source. The third argument,

start=list(Amp=5.0,Mu=5.0,Sigma=5.0)

needs some explanation. The iterative calculation involves varying parameter values and checking whether the overall error is reduced. The user must set reasonable initial values for the search. The start option requires an **R** list, a general set of values. The *list()* operator combines several objects into a list. In the form shown, the first item in the list has name *Amp* and value 5.0. Be aware that for some choices of parameters, the process may not converge or may converge to an alternate solution.

The *summary()* command produces a console listing like this:

Formula: y ~ I(Amp * exp(-1 * (Mu - x)^2/(2 * Sigma^2))) Parameters: Estimate Std. Error t value Pr(>|t|) Amp 22.45079 1.86245 12.05 1.49e-14 *** Mu 8.63087 0.08361 103.22 < 2e-16 *** Sigma -0.87288 0.08361 -10.44 1.02e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.675 on 38 degrees of freedom Number of iterations to convergence: 13 Achieved convergence tolerance: 5.277e-06

The next three commands plot the raw data and the fitting line following methods that we discussed in previous articles. Finally, there is a new command that gives the confidence interval:

confint(fit,level=0.95)

which produces a console listing like this:

2.5% 97.5% Amp 18.738008 26.3724502 Mu 8.460237 8.8129274 Sigma -1.071273 -0.7146478

The results have the following interpretation. What is the range where we can have 95% confidence that the parameter value lies within it. In other word, for the above example we can have 95% confidence that the peak amplitude is between 18.738008 and 26.3724502.

With an understanding of the mechanics of the calculation, we can check out some results. To begin, a high level of noise in the spreadsheet (*Noise* = 5.0) leads to the data points and fitting line of Fig. 3. Here is a summary of numerical results:

Noise level: 5.0 Estimate Std. Error 2.5% 97.5% Generating Amp 22.45079 1.86245 18.738008 26.3724502 20.0 Mu 8.63087 0.08361 8.460237 8.8129274 8.5 Sigma -0.87288 0.08361 -1.071273 -0.7146478 1.06

The negative sign for *Sigma* is not an issue because the parameter always appears as a square. Because the peak is barely discernible in the noise, there is some error in the estimate of *Sigma*. Rerunning the example with reduced noise level (*Noise* = 0.5) provides a check that the *nls()* command is operating correctly. Figure 4 shows the data points and fitting line. The results are much closer to the ideal values with no signal noise.

Noise level: 0.5 Estimate Std. Error 2.5% 97.5% Amp 20.02893 0.17535 19.673012 20.386997 Mu 8.57904 0.01084 8.557082 8.600996 Sigma 1.07258 0.01084 1.050487 1.095135

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.