# Nonlinear fitting and parameter estimation

(This is seventh article in the series Using R for Gambet statistical analysis.[1]) To this point, we have carried out basic statistical calculations with R. With some effort, we could perform the operations in spreadsheets or our own programs. In this article, we’ll turn to a much more sophisticated calculation that would be difficult to reproduce: nonlinear parameter fitting. Despite the complexity of the underlying calculations, it is simple to set up the R script.

Figure 1 illustrates the test calculation. Given a noisy signal, we want to determine if there is a resonance hidden within it. Such a response usually has the form of a Gaussian function

A exp[-(x-mu)^2/(2.0*sigma^2)          [1]

We want to determine values of A, mu and sigma (the resonance width) such that the function gives the most probable fit to the data of Fig. 1. We also want to determine confidence levels for the fit. This is clearly a nonlinear problem because the parameters mu and sigma are integrated in the function.

Figure 1. Test data: resonance in a noisy signal.

The spreadsheet of Fig. 2 is used to generate data over the range 0.0 < x < 20.0. Values of y are determined by Eq. 1 with A = 20.0, mu = 8.58 and sigma = 1.06. The noise level is set as an adjustable parameter. Rather than save the spreadsheet as a CSV file, we copy the data section as shown and past into a text file file, peakdata.tab. In this case, the entries on a line are separated by tab characters instead of commas. Therefore, we use a different command form to read the file:

`PeakData = read.table("peakdata.tab",header=T,sep="")`

The header sets the column names of the data frame PeakData to y and x. The option sep=”” means that any white-space characters (including tabs) act as delimiters.

Figure 2. Spreasheet to generate data pasted into a tab-delimited file.

To run the example, copy and paste this text into the R script editor:

```PeakData = read.table("peakdata_high.tab",header=T,sep="")
fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2))),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0))
summary(fit)
plot(PeakData\$x,PeakData\$y)
new = data.frame(x = seq(min(PeakData\$x),max(PeakData\$x),len=200))
lines(new\$x,predict(fit,newdata=new))
confint(fit,level=0.95)```

The critical command line for the nonlinear fit is

`fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/Sigma^2)),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0))`

The first argument of the nls() command is a defining formula:

`y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2)))`

It implies that the variable y depends on the independent variable x as well as parameters named Amp, Mu and Sigma. The dependence is declared explicitly and enclosed in the I() function to be safe. The second argument, data=PeakData, defines the input data source. The third argument,

`start=list(Amp=5.0,Mu=5.0,Sigma=5.0)`

needs some explanation. The iterative calculation involves varying parameter values and checking whether the overall error is reduced. The user must set reasonable initial values for the search. The start option requires an R list, a general set of values. The list() operator combines several objects into a list. In the form shown, the first item in the list has name Amp and value 5.0. Be aware that for some choices of parameters, the process may not converge or may converge to an alternate solution.

The summary() command produces a console listing like this:

```Formula: y ~ I(Amp * exp(-1 * (Mu - x)^2/(2 * Sigma^2)))

Parameters:
Estimate Std. Error t value Pr(>|t|)
Amp   22.45079    1.86245   12.05 1.49e-14 ***
Mu     8.63087    0.08361  103.22  < 2e-16 ***
Sigma -0.87288    0.08361  -10.44 1.02e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.675 on 38 degrees of freedom

Number of iterations to convergence: 13
Achieved convergence tolerance: 5.277e-06```

The next three commands plot the raw data and the fitting line following methods that we discussed in previous articles. Finally, there is a new command that gives the confidence interval:

`confint(fit,level=0.95)`

which produces a console listing like this:

```2.5%      97.5%
Amp   18.738008 26.3724502
Mu     8.460237  8.8129274
Sigma -1.071273 -0.7146478```

The results have the following interpretation. What is the range where we can have 95% confidence that the parameter value lies within it. In other word, for the above example we can have 95% confidence that the peak amplitude is between 18.738008 and 26.3724502.

Figure 3. Results with Noise = 5.0.

With an understanding of the mechanics of the calculation, we can check out some results. To begin, a high level of noise in the spreadsheet (Noise = 5.0) leads to the data points and fitting line of Fig. 3. Here is a summary of numerical results:

```Noise level: 5.0
Estimate   Std. Error       2.5%      97.5%    Generating
Amp   22.45079    1.86245    18.738008 26.3724502      20.0
Mu     8.63087    0.08361     8.460237  8.8129274       8.5
Sigma -0.87288    0.08361    -1.071273 -0.7146478      1.06```

The negative sign for Sigma is not an issue because the parameter always appears as a square. Because the peak is barely discernible in the noise, there is some error in the estimate of Sigma. Rerunning the example with reduced noise level (Noise = 0.5) provides a check that the nls() command is operating correctly. Figure 4 shows the data points and fitting line. The results are much closer to the ideal values with no signal noise.

```Noise level: 0.5
Estimate  Std. Error      2.5%     97.5%
Amp   20.02893    0.17535    19.673012 20.386997
Mu     8.57904    0.01084     8.557082  8.600996
Sigma  1.07258    0.01084     1.050487  1.095135```

Figure 3. Results with Noise = 0.5.

Footnotes

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.