Figure 1 illustrates the test calculation. Given a noisy signal, we want to determine if there is a resonance hidden within it. Such a response usually has the form of a Gaussian function

*A exp[-(x-mu)^2/(2.0*sigma^2) *[1]

We want to determine values of *A*, *mu* and *sigma* (the resonance width) such that the function gives the most probable fit to the data of Fig. 1. We also want to determine confidence levels for the fit. This is clearly a nonlinear problem because the parameters *mu* and *sigma* are integrated in the function.

The spreadsheet of Fig. 2 is used to generate data over the range 0.0 < *x* < 20.0. Values of *y* are determined by Eq. 1 with *A* = 20.0, *mu* = 8.58 and *sigma* = 1.06. The noise level is set as an adjustable parameter. Rather than save the spreadsheet as a *CSV* file, we copy the data section as shown and past into a text file file, *peakdata.tab*. In this case, the entries on a line are separated by tab characters instead of commas. Therefore, we use a different command form to read the file:

PeakData = read.table("peakdata.tab",header=T,sep="")

The header sets the column names of the data frame *PeakData* to *y* and *x*. The option *sep=””* means that any white-space characters (including tabs) act as delimiters.

To run the example, copy and paste this text into the **R** script editor:

PeakData = read.table("peakdata_high.tab",header=T,sep="") fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2))),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0)) summary(fit) plot(PeakData$x,PeakData$y) new = data.frame(x = seq(min(PeakData$x),max(PeakData$x),len=200)) lines(new$x,predict(fit,newdata=new)) confint(fit,level=0.95)

The critical command line for the nonlinear fit is

fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/Sigma^2)),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0))

The first argument of the *nls()* command is a defining formula:

y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2)))

It implies that the variable *y* depends on the independent variable *x* as well as parameters named *Amp*, *Mu* and *Sigma*. The dependence is declared explicitly and enclosed in the *I()* function to be safe. The second argument, *data=PeakData*, defines the input data source. The third argument,

start=list(Amp=5.0,Mu=5.0,Sigma=5.0)

needs some explanation. The iterative calculation involves varying parameter values and checking whether the overall error is reduced. The user must set reasonable initial values for the search. The start option requires an **R** list, a general set of values. The *list()* operator combines several objects into a list. In the form shown, the first item in the list has name *Amp* and value 5.0. Be aware that for some choices of parameters, the process may not converge or may converge to an alternate solution.

The *summary()* command produces a console listing like this:

Formula: y ~ I(Amp * exp(-1 * (Mu - x)^2/(2 * Sigma^2))) Parameters: Estimate Std. Error t value Pr(>|t|) Amp 22.45079 1.86245 12.05 1.49e-14 *** Mu 8.63087 0.08361 103.22 < 2e-16 *** Sigma -0.87288 0.08361 -10.44 1.02e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.675 on 38 degrees of freedom Number of iterations to convergence: 13 Achieved convergence tolerance: 5.277e-06

The next three commands plot the raw data and the fitting line following methods that we discussed in previous articles. Finally, there is a new command that gives the confidence interval:

confint(fit,level=0.95)

which produces a console listing like this:

2.5% 97.5% Amp 18.738008 26.3724502 Mu 8.460237 8.8129274 Sigma -1.071273 -0.7146478

The results have the following interpretation. What is the range where we can have 95% confidence that the parameter value lies within it. In other word, for the above example we can have 95% confidence that the peak amplitude is between 18.738008 and 26.3724502.

With an understanding of the mechanics of the calculation, we can check out some results. To begin, a high level of noise in the spreadsheet (*Noise* = 5.0) leads to the data points and fitting line of Fig. 3. Here is a summary of numerical results:

Noise level: 5.0 Estimate Std. Error 2.5% 97.5% Generating Amp 22.45079 1.86245 18.738008 26.3724502 20.0 Mu 8.63087 0.08361 8.460237 8.8129274 8.5 Sigma -0.87288 0.08361 -1.071273 -0.7146478 1.06

The negative sign for *Sigma* is not an issue because the parameter always appears as a square. Because the peak is barely discernible in the noise, there is some error in the estimate of *Sigma*. Rerunning the example with reduced noise level (*Noise* = 0.5) provides a check that the *nls()* command is operating correctly. Figure 4 shows the data points and fitting line. The results are much closer to the ideal values with no signal noise.

Noise level: 0.5 Estimate Std. Error 2.5% 97.5% Amp 20.02893 0.17535 19.673012 20.386997 Mu 8.57904 0.01084 8.557082 8.600996 Sigma 1.07258 0.01084 1.050487 1.095135

**Footnotes**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>**RStudio**remembers your screen setups, so it is not necessary to resize and reposition floating windows in every session.- The script editor (upper left in Fig. 1) features syntax highlighting and an integrated debugger. You can keep multiple tabbed documents active.
- An additional area at upper-right shows all
**R**objects current defined. Double clicking a data frame opens it in a data editor tab in the script window. - The area at the lower right has multiple functions in addition to plot displays. Other tabs invoke a file manager, a package manager for
**R**supplements and a*Help*page for quick access to information on**R**commands.

I’ll assume you’re using **RStudio** for the example of this article and following ones.

This article shows how to use **R** to fit a three-dimensional polynomial function to noisy data. A practical example is making an accurate interpolation of radiation dose at a point , *D(x,y,z)*, in a **GamBet** calculation. Here, the dose in any single element is subject to statistic variations that depend on the number of model input particles. The idea is to get an estimate at a test point [x0,y0,z0] by finding the most probable smooth function using values from a large number of nearby elements. The approach is valid when the physics implies that the quantity varies smoothly in space (*e.g*, the variation of potential in electrostatics). The present calculation is the three-dimensional extension of the interpolation discussed in a previous article.

As with previous examples, we’ll need to create some test data. The measurement point is [*XAvg* = 23.45, *YAvg* = 10.58, *ZAvg* = -13.70] and the ideal variation of dose about the point is

*D = 19.45 + 2.52*(x-XAvg) – 0.25*(X-XAvg)^2*

* – 4.23*(y-YAvg) + 0.36*(y-YAvg)^2*

* + 1.75*(z-ZAvg) – 0.41*(z-ZAvg)^2 [1]
*

The spacing of data points about the measurement point is *Dx* = 1.00, *Dy* = 1.25 and *Dz* = 0.90. In this case, we’ll use a computer program to record the data in a *CSV* file. For reference, here is the core of the FORTRAN program:

OPEN(FILE='multiregress.csv',UNIT=30) WRITE (30,2000) XAvg = 23.45 YAvg = 10.58 ZAvg = -13.70 Dx = 1.00 Dy = 1.25 Dz = 0.90 ! Main data loop DO Nz = -5,5 z = ZAvg + REAL(Nz)*Dz DO Ny = -5,5 y = YAvg + REAL(Ny)*Dy DO Nx = -5,5 x = XAvg + REAL(Nx)*Dx CALL RANDOM_NUMBER(Zeta) Noise = 2.0*(0.5 - Zeta) D = 19.45 & + 2.52*(x-XAvg) - 0.25*(X-XAvg)**2 & - 4.23*(y-YAvg) + 0.36*(y-YAvg)**2 & + 1.75*(z-ZAvg) - 0.41*(z-ZAvg)**2 & + Noise WRITE (30,2100) x,y,z,D END DO END DO END DO CLOSE (30) 2000 FORMAT ('x,y,z,D') 2100 FORMAT (F14.8,',',F14.8,',',F14.8,',',F14.8)

The program creates a header and then writes 1331 data lines in *CSV* format. Each line contains the position and dose, [*x,y,z,D*]. Here are the initial entries in *multiregress.csv*:

x,y,z,D 18.45000076, 4.32999992, -18.19999886, 22.14448738 19.45000076, 4.32999992, -18.19999886, 28.94458199 20.45000076, 4.32999992, -18.19999886, 38.34004974 ...

Run **R**, clear the console window (*Control-L*) and close any previous script. Copy and paste the following text into the script editor:

rm(list=objects()) setwd("C:/RExamples/MultiRegress") DataSet = read.csv("multiregress.csv",header=T) # Calculate the measurement point XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) # Reference spatial variables to the measurement point DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg # Perform the fitting FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet) summary(FitModel) # Plot a scan along x through the measurement point png(filename="XScan.png") xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) plot(xscan$x,xscan$D) lines(xscan$x,predict(FitModel,newdata=xscan)) dev.off()

The first three lines perform operations that we have previously discussed: clear all **R** objects, set a working directory and read information from the *CSV* file created by the FORTRAN program. The information is stored in data frame *DataSet*. Use *Control-R* in the script editor to execute the command lines. Note that *DataSet* appears in the *Environment* list of **RStudio**. Double-click on **DataSet** to open it in a data-editor tab. Here, you can inspect values and even modify them,

The first task in the analysis is to shift the spatial values so that they are referenced to the measurement point, as in Eq. [1].

XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg

We use the *mean()* function under the assumption that the data points are centered about the measurement position. A single command performs the linear fit:

FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet)

The *summary()* command gives the following information:

Estimate Std. Error Generating (Intercept) 19.540509 0.179142 19.45 x 2.560374 0.025733 2.52 y -4.226011 0.020587 -4.23 z 1.737097 0.028593 1.75 I(x^2) -0.265439 0.009214 -0.25 I(y^2) 0.358165 0.005897 0.36 I(z^2) -0.404630 0.011375 -0.41

As with all statistical calculations, a visual indicator is a quick way to gauge the validity of the fit. The following commands create a two-dimensional plot, a scan of data points and the fitting function along *x* at *y* = 0.0, *z* = 0.0 (through the measurement point). The plot is saved as a *PNG* file in the working directory.

[1] png(filename="XScan.png") [2] xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) [3] plot(xscan$x,xscan$D) [4] lines(xscan$x,predict(FitModel,newdata=xscan)) [5] dev.off()

Line [1] opens a file for *PNG* output, while Line5] flushes the plot information to the file and closes it. Line[2] uses the *subset()* function to create a data frame, *xscan*, that contains only data points where *y* = 0.0 and *z* = 0.0. The first argument is the source data frame and the second argument is the criterion for picking included row. There are two noteworthy features in the command expression:

- The symbol & denotes the logical
*and*operation. - The inclusion criterion has some slack. This is to account to floating point round-off errors. If you inspect
*xscan*with the data editor, you will note that are some values on the order of 10[-8] rather than 0.0

Line [3] plots the raw data points while the *lines()* command in Line 4 applies the method discussed in a previous article to plot the fitting function. Similar operations were applied to create scans through the measurement point along *y* and *z* to produce the plots of Fig. 2.

**Footnotes**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]># Function to generate test data TrigDemo = function(z) { Out = 1.4+2.30*sin(z)+0.75*cos(z)-1.80*sin(2*z)+2.15*cos(2*z)+0.65*sin(3*z)-0.25*cos(3*z) return(Out) } # Set up some variables zmin = 0.0 zmax = 5.0 Noise = 0.55 # Plot the function without noise curve(TrigDemo(z),zmin,zmax,xname="z") # Create a data frame of input data zvector = seq(from=zmin,to=zmax,length.out=51) phivector = TrigDemo(zvector) TestData = data.frame(z=zvector,phi=phivector) # Add some noise TestData$phi = TestData$phi + rnorm(length(TestData$phi),0,Noise) # Plot the raw data plot(TestData$z,TestData$phi) # Fit the data and list the results FitTrig = lm(phi~I(sin(z))+I(cos(z))+I(sin(2*z))+I(cos(2*z))+I(sin(3*z))+I(cos(3*z)),TestData) summary(FitTrig) # Add the fitted line to the plot PlotSeq = seq(from=0.0,to=5.0,length.out=51) PlotPos = data.frame(z=PlotSeq) lines(PlotPos$z,predict(FitTrig,newdata=PlotPos))

We’ll generate data by adding statistical noise to the test function

*phi = 1.4 + 2.30*sin(z) + 0.75*cos(z) – 1.80*sin(2*z) + 2.15*cos(2*z) + 0.65*sin(3*z) – 0.25*cos(3*z) * [1]

over the interval 0.0 < *z* < 5.0. Equation [1] is a lengthy expression, and it would be inconvenient to add it in command arguments. For efficiency, we’ll define our own function to supplement the built-in functions of **R** like. Here are the corresponding lines:

TrigDemo = function(z) { Out = 1.4+2.30*sin(z)+0.75*cos(z)-1.80*sin(2*z)+2.15*cos(2*z)+0.65*sin(3*z)-0.25*cos(3*z) return(Out) }

The top definition line contains the name of the function, the keyword *function* and the names of the argument (or arguments) that will be used in the calculation. Any number of command lines for calculations with the argument may be included between the braces. **R** functions have flexible output. There is a single output value if the input argument is a scalar while, a vector argument gives a vector output. In other circumstances, the output may even be the functional dependence itself. The type of output depends on the context. To demonstrate the third case, we’ll use the *TrigDemo* function to create a plot of the variation of Eq. [1] using the *curve()* command:

curve(TrigDemo(z),zmin,zmax,xname="z")

The first argument is a definition of the mathematical expression to plot, such as “18.67 + z^2″, the next two arguments are numerical values that give the range of the independent variable and the last argument defines the name of the independent variable. Figure 1 shows the resulting plot.

The following script command lines demonstrate a case where the function produces vector output:

[1] zvector = seq(from=zmin,to=zmax,length.out=51) [2] phivector = TrigDemo(zvector) [3] TestData = data.frame(z=zvector,phi=phivector)

As we saw in the previous article, the *seq()* operator generates a vector *zvector* that contains a set of *z* values over the desired range with interval δz = 0.1. We apply the function *TrigDemo* with argument *zvector* to generator *phivector*, a set of corresponding dependent values. The vectors are combined into a data frame with column names *z* and *phi*. Finally, we want to add some statistical noise:

TestData$phi = TestData$phi + rnorm(length(TestData$phi),0,Noise)

The function *rnorm()* generates random numbers in a normal distribution. **R** includes a related set of random-number functions for different distributions (*runif(), rpois()*,…). The output is a vector. The arguments in *rnorm()* are 1) the length of the output vector, 2) the mean of the distribution and 3) the standard deviation. The points in Fig, 2 show the resulting noisy representation of the generating function.

This form of the *lm()* command performs the fit:

FitTrig = lm(phi~I(sin(z))+I(cos(z))+I(sin(2*z))+I(cos(2*z))+I(sin(3*z))+I(cos(3*z)),TestData)

Finally, this set of commands plot the best-fit line in Fig. 2 following the method discussed in the previous article:

PlotSeq = seq(from=0.0,to=5.0,length.out=51) PlotPos = data.frame(z=PlotSeq) lines(PlotPos$z,predict(FitTrig,newdata=PlotPos))

Note how the column names of *TestData*, *z* and *phi*, are used consistently throughout all the commands.

**Footnotes**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>*y = a1*f1(x) + a2*f2(x) + …. *[1]

where *f1(x), f2(x),* … are known functions of any type. The task is to find the set of parameters *a1, a2, …* such that Eq. [1] has the highest probability of matching the measured points. Given that Eq. [1] represents a linear fit, what is a *nonlinear fit*? To illustrate, suppose we wanted to find a resonance response buried in a noisy signal. In this case, the proposed function would be

*y = a1*exp(-(x-a2)^2/a3^2*

We seek three parameters, only one of which is a linear multiple of the function. The other two parameters are integrated within the function. While the linear problem is deterministic, nonlinear calculations require iterative methods that may or may not not converge. A following article discusses nonlinear fitting in **R**.

This article and the next one cover univariate linear fitting, *y = f(x)*. In a following article, we’ll advance to multi-dimensional fitting, *z = g(x,y),*…. To start, let’s revisit the example from the previous article. The points in Figure 1 show the raw data in the file *csvdata.csv*. The points follow a curve with two inflection points, so we suspect that a third order polynomial would be a good guess:

*y = a + b*x + c*x^2 + d*x^3*

This function satisfies the definition of a linear fit (Eq. [1]).

Copy and paste the following text into the script window of **R**:

setwd("C:/RExcamples/LinFitStudies") TestData = read.csv("csvdata.csv",header=TRUE) FitPoly = lm(y~I(x)+I(x^2)+I(x^3),TestData) summary(FitPoly) CheckPoints = data.frame(x = c(0.0,5.0,10.0,15.0,20.0)) Pred = predict(FitPoly,newdata=CheckPoints) Pred plot(TestData$x,TestData$y,type="p") PlotSeq = seq(from=0.0,to=20.0,length.out=201) PlotPos = data.frame(x=PlotSeq) lines(PlotPos$x,predict(FitPoly,newdata=PlotPos))

As before, we set a working directory. It contains a copy of the test data prepared for the previous article. Save the script as *LinFit.R*. Again we read the file contents to a data frame named *TestData* and apply the *lm()* command to produce the fitting object *FitPoly*:

FitPoly = lm(y~I(x)+I(x^2)+I(x^3),TestData)

The fitting function specification is more complex than the first order function of the previous article:

y~I(x)+I(x^2)+I(x^3)

It implies that values in the column named *y* of *TestData* depend on the values in the column named *x*. The functional dependencies involve powers of *x* up to *x^3*. Note the appearance of the **R** command *I()*. It signals that the symbols inside the argument should be interpreted as arithmetic operations rather than function specifications. There is some overlap between the two. For example, in a function specification the + symbol does not designate addition, but rather signals that **R** should include another functional dependency[1].

The command *summary(FitPoly)* produces the following information:

Estimate Std. Error Generating function (Intercept) 22.278672 0.672675 22.67 I(x) 4.877190 0.293093 4.71 I(x^2) -0.314646 0.034174 -0.30 I(x^3) 0.010301 0.001123 0.01

The coefficients of the original generating function have been included in the last column.

There are two more tasks to complete this example:

- Check the interpolated value returned by the fitting function for comparison to the linear interpolation of the previous article.
- Superimpose a plot of the predicted curve on the original data for a visual validity check.

The following command lines accomplish the first task:

[1] CheckPoints = data.frame(x = c(0.0,5.0,10.0,15.0,20.0)) [2] Pred = predict(FitPoly,newdata=CheckPoints) [3] Pred

Line [1] uses the *data.frame()* and *c()* commands that we already discussed. The *c()* command combines several position values (including *x = 15.0*) in a vector. The *data.frame()* command incorporates the vector in a data frame named *CheckPoints* with one column named *x.* Line [2] contains the command *predict()* which requires two arguments: 1) an **R** fitting object and 2) a data frame with a named column that corresponds to the independent variable of the fitting object. The *predict()* command returns a value of *y* for each *x* value. The final line produces the following listing of information in the console window:

1 2 3 4 5 22.27867 40.08612 49.88714 59.40764 76.37351

The fourth value corresponds to *x* = 15.0.

The following lines create the plot of Fig. 1:

[1] plot(TestData$x,TestData$y,type="p") [2] PlotSeq = seq(from=0.0,to=20.0,length.out=201) [3] PlotPos = data.frame(x=PlotSeq) [4] lines(PlotPos$x,predict(FitPoly,newdata=PlotPos))

We’re already familiar with the *plot()* command of Line [1] — it opens the plot and creates the set of points. The next three lines provide a useful template for plotting any fitted line:

- The
*seq()*command creates a vector*PlotSeq*of uniformly-spaced position values over the plot range 0.0 ≤*x*≤ 20.0. The vector contains 201 values with interval δ*x*= 0.1. - Line [2] incorporates
*PlotSeq*into the data frame*PlotPos*. The single column has the name*x*. - The
*lines()*command adds a number of connected lines to an existing plot. The arguments are two vectors of equal length giving values along the abscissa and ordinate. We use*PlotPos$x*as the first vector and again use the*predict()*command to supply the second vector.

The result is the line in Fig. 1.

**Footnotes**:

[1] In the previous article, we used a function specification *y~x* without the *I()* command because the expression did not include any common symbols. To be safe, it is best to use *I()* for all functional expressions.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>The most common format for storing data in a text file is c*omma-separated-variable*, or *CSV*. In a general sense, *CSV* could apply to any file that uses commas as a delimiter. We’ll use a more restrictive definition. A *CSV* file contains a number of data lines (rows). A line consists of text terminated by a carriage return and/or linefeed character. Each line is divided into the same number of text entries (columns) separated by commas. The text entries may represent strings or numbers. This organization is reminiscent of spreadsheet data or **R** data frames — it is not surprising that both types of programs can read and write directly to *CSV* files.

In this article we’ll use a spreadsheet to create data for a statistical analysis, save the results as a *CSV* file, read the file into **R** and perform several useful calculations. Figure 1 shows a spreadsheet setup to generate points in the range 0.0 ≤ *x* ≤ 20.0 from the formula

y = 22.67 + 4.71 x -0.3 x^2 + 0.01 x^3 [1]

Random noise uniformly distributed over the range -2.5 ≤ δ*y* ≤ 2.5 is added for statistic analyses in **R**. Note that a header line containing the strings “x” and “y” has been included. The choice of names will be important in **R — **we will make consistent references to them when defining **R** operations.

To create a **CSV** file from **OpenOffice Calc**, simply use *SaveAs* and choose the format option *Text CSV*. Here is a section of the resulting file:

x,y 0,24.8324297602 0.25,21.8520734699 0.5,24.207368359 ...

There are two columns of data and 81 rows (not counting the header). Most programs that read and write *CSV* files (including **Calc** and **R**) have flexible input parsers that recognize strings and numbers in any valid format, including scientific notation.

Here is the **R** script that we will be using:

setwd("C:/RExamples/CSVStudies") TestData = read.csv("csvdata.csv",header=TRUE) plot(TestData$x,TestData$y) Check15 = subset(TestData,x >= 12.5 & x <= 17.5) Check15$x = Check15$x - 15.0 Fit15 = lm(y~x,Check15) summary(Fit15) plot(Check15$x,Check15$y) abline(coef(Fit15)) write.csv(Check15,file="points15.csv",row.names=FALSE)

Run **R** and choose *File/New script*. Copy and paste the lines above into the script editor window. (It’s a good idea to save the script file for later reference.) We’ll step through the lines to learn some useful **R** operations. Set the cursor in the script editor to the first line and press *Control-R* to advance through each step.

It’s always a good idea to organize input/output files in a single directory. This command sets the working directory so you don’t have to enter full paths for file operations.

setwd("C:/RExamples/CSVStudies")

The following command (the core of this article) loads the *CSV* data file into a data frame named *TestData* with 2 columns and 81 rows.

TestData = read.csv("csvdata.csv",header=TRUE)

The columns have the names “*x*” and “*y*“. The option *header=TRUE* signals to **R** that the first line of the file contains the column names. If the file has no header, there are options for the *read.csv* command to set the names explicitly. Go to the console window and type *TestData* to confirm that the information was loaded.

The next command

plot(TestData$x,TestData$y)

creates the plot of the input data shown as the top graph in Fig. 2. It is a noisy version of the curve of Eq. 1.

In a following article, we’ll see how to get an estimate of all the polynomial coefficients in Eq. 1. For now, we’ll concentrate on a local interpolation to estimate the value and slope at a single measurement point, *x* = 15. For this, we’ll include only data points near the measurement point. This command,

Check15 = subset(TestData,x >= 12.5 & x <= 17.5)

produces a data frame *Check15* that is a subset of *TestData*. The selection criterion is that values in the column named “*x*” must be greater than 12.5 and less than 17.5. The range is shown by dashed red lines in Fig. 2. *Check15* contains two columns named “*x*” and “*y*” with 21 rows. Again, type *Check15* in the console to review the content.

We’ll use a routine that finds the slope and intercept of the most probable straight line for the data. The intercept equals the desired interpolation if the measurement point corresponds to x = 0.0. This command shifts all values in the “*x*” column of the new data frame by -15.0:

Check15$x = Check15$x - 15.0

We’re ready to do the fit, using the powerful and versatile l*m* command:

Fit15 = lm(y~x,Check15)

The command has two arguments: 1) a specification of the fitting function and 2) the data frame to be used. The function specification *y~x* is a subtle concept and merits a detailed explanation. An important point is that it is not the fitting equation, but rather an expression of dependencies in the fit.

- The ~ symbol designates a functional relationship rather than an equality.
- The
*y*to the left of ~ specifies that values in the column named “*y*” correspond to the dependent variable. - The
*x*to the right of ~ specifies that values in the column named “*x*” correspond to the independent variable. - The appearance of the single
*x*to the right of ~ indicates that the fit should set a linear relationship,*y = a + bx*.

We’ll have a chance to discuss more complex formula specifications in following articles.

The *lm* operation creates *Fit15*, a special object. It is neither a vector nor a data frame, but rather a set of fitting coefficients and tolerances in a format specific to **R**. There are special operations in **R** to display or to extract information from fitting objects. For example, the operation *summary(Fit15)* produces the following display:

Call: lm(formula = y ~ x, data = Check15)

Residuals: Min 1Q Median 3Q Max -2.3474 -2.1410 0.1371 1.1435 2.8666

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.7433 0.4041 147.851 < 2e-16 *** x 2.5978 0.2669 9.732 8.13e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.852 on 19 degrees of freedom Multiple R-squared: 0.8329, Adjusted R-squared: 0.8241 F-statistic: 94.72 on 1 and 19 DF, p-value: 8.133e-09

This is probably more information than you’d want to know. The important results are that 1) the interpolated value at the point is *y* = 59.7433 +- 0.4041 and 2) the slope is *dy/dx* = 2.5978 +- 0.2669. The predictions from Eq. 1 are *y* = 59.7557 and *dy/dx* = 4.71 -0.6*15 + 0.03*15^2 = 2.4600.

Another function to extract information from a fit is *coef().* For a linear fit, the routine returns two numbers, the intercept and slope. It so happens that there is a plotting command in R, *abline()*, that adds a straight-line plots to an existing plot and its arguments are the intercept and slope of the line. We can therefore understand the following command set:

plot(Check15$x,Check15$y) abline(coef(Fit15))

The *plot* command displays *y* versus *x* over the range of *x* in *Check15*. The *abline* command adds the straight line determined by the fitting operation. The lower plot in Fig. 2 shows the result.

Finally, we can create a *CSV* file with **R**:

write.csv(Check15,file="points15.csv",row.names=FALSE)

The options indicate that row names and quotation marks should be excluded. Here is the initial section of the file *point15.csv*:

x,y -2.5,56.0587661858 -2.25,54.0067949137 -2,54.5795616217 -1.75,57.4048454528 -1.5,56.623110328 ...

If we didn’t include the options, the output of the *write.csv()* function would look like this:

"","x","y" "51",-2.5,56.0587661858 "52",-2.25,54.0067949137 "53",-2,54.5795616217 "54",-1.75,57.4048454528 "55",-1.5,56.623110328 ...

**Footnotes**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>Program output is shown in blue and things you enter are shown in red. The prompt “>” indicates that the program is ready for input. Type

Power = 2.4

followed by *Enter*. This defines a simple numerical variable *Power* with the value 2.4. Anything you define is referred to as an *object*. Objects may include string variables, vectors and more complex data assemblies. To see a list of objects, type

objects()

To see the value of *Power*, type

Power

Because **R** is case sensitive, the capital *P* in *Power* is significant. The result is

[1] 2.4

The initial [1] is a display index, useful if the object contains many values.

To illustrate arithmetic operations, type

Number = 3.558 Answer = Number^Power

defining two new objects. Entering the name of the final variable gives the value

> Answer [1] 21.03266

**R** also supports string variables and operations. For example,

S1 = "The first part of the sentence," S2 = "the second part of the sentence." STotal = paste(S1,S2,sep=" ") STotal

Here’s a useful shortcut — it’s not necessary to retype the examples in these articles. Simply copy the text information above and paste it into the console. **R** executes the commands as they are transferred from the clipboard. The resulting console entries are shown in Fig. 2.

The R *paste* function combines two or more strings with the specified separation character. In the example, a space was used. No characters are added if *sep=””*.

Except for the simplest calculations, working in the console mode is impractical. For serious calculations, the best approach is to build and to test a sequence of commands with the option to save them. For this, we will work with **R** scripts (*filename.r*). Choose *File/New script* to open a new window for the script editor. Note that the console window remains active. Copy and paste the following statement into the script window:

rm(list=objects())

Nothing happens in the console window. In the passive mode, the script window acts as a basic text editor. Now, put the cursor on the first line and press *Control-R* (run). The command is transmitted to the console, which performs the action. Go back to the console and type

objects()

The response *character(0)* indicates that all defined objects have been removed. The **R** interface is quite flexible. You can move back and forth between the console and script editor, entering commands directly, editing the script or sending selected commands from the script.

Copy and paste the following content to the script window:

# Clear any objects loaded from a previous session rm(list=objects()) # Define two new vectors xvals = c(1.0,1.5,2.0,2.5,3.0,3.5,4.0) yvals = c(3.4,2.5,4.1,3.6,5.2,4.9,6.3) # Put them into a data frame TestSet = data.frame(V1,V2) # Show the objects that were created objects() # Calculate a function mean(TestSet$yvals) # Plot the data frame plot(TestSet$xvals,TestSet$yvals,type="b")

The set of commands illustrates some critical **R** concepts and defines a practical calculation. Save the script for future use with the name *Exercise0101.R*. Let’s discuss the meanings of the commands before running them.

The *rm* command removes objects specified by the list of names in the argument. Here, a *list* is a collection of objects, a generalized vector. The output of the function *object()* is a list of the names of all defined objects. Hence, the form of the *rm* command removes everything. The lines

xvals = c(1.0,1.5,2.0,2.5,3.0,3.5,4.0) yvals = c(3.4,2.5,4.1,3.6,5.2,4.9,6.3)

create two vector objects named *xvals* and *yvals*. A vector is an ordered set of like items. The *c()* function combines a group of items into a vector, in this case a set of seven numbers. The expression *yvals[4]* returns the number 2.5.

The *data frame* [1] is an important object in **R**. It contains one or more vectors of any type (numbers, strings,…). The restriction is that all vectors must have the same length (number of entries). The data frame structure is appropriate for experimental results as well as the contents of **GamBet** *SRC* files. The command,

TestSet = data.frame(xvals,yvals)

combines the two vectors into a data frame named *TestSet*, which can be thought of as a matrix with two columns and seven rows. It is important to note that the names of things are important in **R**. By default, the column names in *TestSet* are set to the names of the component vectors, *xvals* and *yvals*[2]. You can refer to columns by their names. For example, the expression *TestSet$yvals* has the same content as the *yvals* vector. The command

mean(TestSet$yvals)

gives the average value of the components of the second vector, 4.285714.

The final command

plot(TestSet$xvals,TestSet$yvals,type="b")

calls for a simple plot of data points with *TestSet$xvals* along the abscissa and *TestSet$yvals* along the ordinate. As with almost all **R** functions, the user may set many options in *plot()* to over-ride defaults. In this case, the option *type=”b”* specifies that the plot should contain both point symbols and connecting lines.

To step through each script command in sequence, put the script editor cursor in the first line and then press *Control-R* several times. To execute the full script, highlight all the lines and press *Control-R*. Figure 3 shows the final state of the console window with the display of defined objects and the calculated mean. **R** has also opened up a graphics window to create the plot.

Finally, to run a script from the console without using the script editor, use a command of the form

source("C:/RExamples/Exercise0101.R")

Note that the path specification contains forward slashes rather than the backslashes used in **Windows**. In the plot of Fig. 3, the data are crying out for an estimate of the slope and intercept. We’ll tackle that issue after we learn something about loading data into **R**.

**Footnotes**

[1] Some references use the term *data set* to refer to the content of the vectors and the term *data frame* to refer to the object that holds the content. This distinction seems unnecessary, so I will apply the term *data frame* to designate both the contents and the container.

[2] The *data.frame* command has an option to set the column names explicitly.

[3] Contact us : techinfo@fieldp.com.

[4] Field Precision home page: www.fieldp.com.

]]>In addition, general mathematics programs like SciLab (http://www.scilab.org/) perform many statistical functions.

In [...]]]>
**GamBet**. Although our **GenDist** program performs basic calculations, dedicated high-power statistical packages are useful for intensive work. There are dozens of statistical programs, many of them available for free. A comprehensive list is available at

https://en.wikipedia.org/wiki/List_of_statistical_packages

In addition, general mathematics programs like **SciLab** (http://www.scilab.org/) perform many statistical functions.

In general, the interfaces of technical programs center on a console window where you may enter commands interactively. Commands range from simple arithmetic to complex statistical operations. The programs can read data from text files or spreadsheets and follow a sequence of commands from a script file. There are several options to export results and most packages have plotting capabilities. There are advantages to using standard programs rather than writing your own:

- They feature extensive computational resources, programed for efficiency by statistical experts.
- There is less chance of error because routines have been thoroughly tested.
- In the long run, the setup times are short compared to building your own programs.

On the other hand, the advantages come at a price.

- Powerful technical packages have steep learning curves. Each program has its own command structure, so learning a program is comparable to acquiring a new computer language.
- There are peculiarities of syntax and organization logic. Inexperienced users may suffer hours of frustration searching for errors.
- The power and numerous options of the programs may initially be a drawback, overwhelming new users.
- Matching program capabilities to your specific needs may be challenging.

Fortunately, perseverance pays off. After several days of concentrated effort, the epiphanous moment arrives when everything starts to make sense. One of the goals of this tutorial is to help you reach that moment sooner. The other is to demonstrate how to handle **GamBet** data and how to perform analyses for Monte Carlo calculations.

The first issue is which mathematics package to choose. There are dozens available, each claiming to be the best. It is certainly not feasible to test them all. I am going to concentrate on a single resource, the **R** package, for the following reasons:

- It is freely available for all computer platforms.
- It has a large international user base.
**R**has powerful standard features with add-on packages for sophisticated work.- A well-designed integrated development environment is available.
- Most important, there is excellent documentation including textbooks and online references.

You can obtain executables of the basic **R** package for most operating systems at:

http://www.r-project.org/

There are links on the site to online documentation. The following textbook is well-organized, covers essential features and has numerous applications:

Sarah Stowell, **Using R for Statistics** (Apress, 2014).

The **R Reference Card** is an invaluable resource:

http://cran.r-project.org/doc/contrib/Short-refcard.pdf

After installing the **R** package, I recommend downloading and installing the **RStudio** integrated development environment:

https://www.rstudio.com/products/RStudio/

Each article in the following series is in the form of a short lesson. The articles demonstrate essential **R** techniques for **GamBet** users. The emphasis is on curve fitting and parameter estimation. A topic of particular importance is how to read **GamBet** source (*SRC*) files directly into the **R** environment.

- Working with the
**R**console: vectors and data frames - Creating and reading CSV files
- Linear curve fitting
- Plotting data, curves and fitted lines
- Working with
**RStudio**: multi-dimensional fitting - Nonlinear fitting and parameter estimation
- Reading and organizing
**GamBet**source files - Building distributions and writing
**GamBet**source files

**Footnotes:**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>There are two changes:

Expansion of the FORCE command for variance reduction via interaction forcing. The command now [...]]]>There are two changes:

- Expansion of the
*FORCE*command for variance reduction via interaction forcing. The command now includes all types of interactions for all types of particles. Previously, the command covered only bremsstrahlung by electrons and pair production by photons. - Introduction of a new
*KILL*command. Its function is to deactivate specified interactions to estimate relative contributions.

The **GamBet Reference Manual** includes the follow command descriptions:

**FORCE [ELEC,PHOT,POSI] [HELAS,…] Factor [NReg]**

Define a forcing factor to increase the probability of infrequent events. **GamBet** automatically adjusts particle weights and interaction probabilities to maintain unbiased statistics. Multiple *FORCE* commands may appear in the script to enhance several interactions. The first keyword gives the particle type and the second keyword gives the interaction type according to the table of Figure 1. The third parameter is the forcing factor for the interaction (a real number larger than 1.0). The final integer parameter *NReg* is the number of the region in the geometric mesh where forcing should be applied. If *NReg* does not appear, forcing applies in all regions. Note that large forcing factors (> 100) may skew the statistics and also exhaust the available storage in the secondary-particle stack. **GamBet** includes a warning in the listing file if stack overflow errors occurred during a run.

**KILL [ELEC,PHOT,POSI] [HELAS,…]**

Artificially turn off an interaction type. This option is useful to check the relative contributions of interactions. As with the *FORCE* command, the first keyword gives the particle type and the second keyword gives the interaction type following the table of Fig. 1.

**Footnotes:**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>Detailed dose calculations for radiation safety require a three-dimensional model of the human body organized by tissue [...]]]>
**Applied X-Ray Science Calculations: Monte Carlo Methods**. In consequence, articles will appear here only infrequently until June 2016. I will post items of interest that I discover while researching the book.

Detailed dose calculations for radiation safety require a three-dimensional model of the human body organized by tissue type. Most of the available information consists of MRI-type images. The gray-scale data indicate divisions within the body, but have no correlation with local physical properties. Fortunately, I found one data set that provides all the information necessary for Monte Carlo radiation simulations, the **Zubal Phantom** (http://noodle.med.yale.edu/zubal/). The work, supported by the U.S. Department of Energy, is described in this paper:

Zubal, I.G., Harrell, C.R, Smith, E.O, Rattner, Z., Gindi, G. and Hoffer, P.B., *Computerized three-dimensional segmented human anatomy*, Medical Physics,** 21**(2):299-302, 1994.

Working from MRI data, the researchers painstakingly identified tissue regions over the scan volume and coded the tissue type as a byte value in each voxel. Four data sets are available:

- Human head (superior-inferior slice direction)
- Human torso (superior-inferior slice direction)
- Human body arms down (superior-inferior slice direction)
- Human body arms folded (lateral slice direction)

The principal investigator has requested that the phantom files (or any modifications of them) not be distributed. Instead, individual users must obtain the files directly from his site. Users are free to modify the files for their applications. Accordingly, I wrote a utility program, **PhantomConv**, to read the phantom information and to produce standard **Mesh** and **MetaMesh** output files in text format. The resulting files may be entered directly into **GamBet** (or any other **TriComp** or **AMaze** solution engines). **PhantomConv** organizes the region numbers in the phantoms into a continuous set and supplies easily-displayed tissue names. The program produces a single *MDF* file containing all slices. It also produces a large set of *MIN* files (one per slice) for 2D work. Figures 1 and 2 show displays in **Mesh** and **MetaMesh**.

Please contact us to obtain a free copy of **PhantomConv** along with instructions for creating your own data set.

**Footnotes:**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>It’s easy to set the associated program for an existing file type. In Windows 7, go to *Control panel/Default programs/Associate a file type or protocol with a program*. Scroll down, select the extension and then click the *Change program* button. Navigate to the program executable and pick it.

But, what if the file extension is not on the list? Things get more complicated — there is no procedure in the *Control panel* to add a new file extension. Here’s how to do it by editing the *Registry* with **regedit** (a built-in Windows utility) [1].

Let’s use an example. I often generate or modify *.min* files (input to **Mesh** and **MetaMesh**) with the **ConText** editor. A check of the existing associations shows that *.min* is not included. Here is the procedure to add it:

- Run
**regedit**. (The*Run*option in the*Start Menu*seems to have disappeared from Windows 7. To start the program, right-click in**FP File Manager**, open a terminal window and enter the command*regedit*). - In
**regedit**, right click*HKEY_CLASSES_ROOT*and choose*New/Key*. - Enter the name
*.min*(be sure to include the dot). - Within the new key (right-hand side of the window), right-click on
*(Default)*and choose*Modify*. - Type in the name
*MetaMesh input*. - Again, right-click on
*HKEY_CLASSES_ROOT*and choose*New/Key*. - Name the key
*MetaMesh input*(the same name as in Step 5). - Exit
**regedit**.

Now if you go back to *Associate a file type with a specific program* from the *Control panel*, the file type *.min* appears in the list. Click the *Choose program* button and navigate to **ConText**. Arcane procedure, but it works.

**Footnotes**

[1] Be cautious if you decide to edit the *Registry*. A big mistake could ruin the Windows setup.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>