(This is sixth article in the series *Using R for Gambet statistical analysis*.[1]) **RStudio** is an integrated development environment (IDE) for working with **R**, similar to the IDEs available for most computer languages. Figure 1 shows a screenshot. There are work areas for the script editor, console and plots. The difference is that they are combined in a single convenient workspace rather than floating windows. **RStudio** is freely available and provides a wealth of features, so it is hard to imagine why anyone would choose not to use it. Here are some of the advantages:

**RStudio**remembers your screen setups, so it is not necessary to resize and reposition floating windows in every session.- The script editor (upper left in Fig. 1) features syntax highlighting and an integrated debugger. You can keep multiple tabbed documents active.
- An additional area at upper-right shows all
**R**objects current defined. Double clicking a data frame opens it in a data editor tab in the script window. - The area at the lower right has multiple functions in addition to plot displays. Other tabs invoke a file manager, a package manager for
**R**supplements and a*Help*page for quick access to information on**R**commands.

I’ll assume you’re using **RStudio** for the example of this article and following ones.

This article shows how to use **R** to fit a three-dimensional polynomial function to noisy data. A practical example is making an accurate interpolation of radiation dose at a point , *D(x,y,z)*, in a **GamBet** calculation. Here, the dose in any single element is subject to statistic variations that depend on the number of model input particles. The idea is to get an estimate at a test point [x0,y0,z0] by finding the most probable smooth function using values from a large number of nearby elements. The approach is valid when the physics implies that the quantity varies smoothly in space (*e.g*, the variation of potential in electrostatics). The present calculation is the three-dimensional extension of the interpolation discussed in a previous article.

As with previous examples, we’ll need to create some test data. The measurement point is [*XAvg* = 23.45, *YAvg* = 10.58, *ZAvg* = -13.70] and the ideal variation of dose about the point is

*D = 19.45 + 2.52*(x-XAvg) – 0.25*(X-XAvg)^2*

* – 4.23*(y-YAvg) + 0.36*(y-YAvg)^2*

* + 1.75*(z-ZAvg) – 0.41*(z-ZAvg)^2 [1]
*

The spacing of data points about the measurement point is *Dx* = 1.00, *Dy* = 1.25 and *Dz* = 0.90. In this case, we’ll use a computer program to record the data in a *CSV* file. For reference, here is the core of the FORTRAN program:

OPEN(FILE='multiregress.csv',UNIT=30) WRITE (30,2000) XAvg = 23.45 YAvg = 10.58 ZAvg = -13.70 Dx = 1.00 Dy = 1.25 Dz = 0.90 ! Main data loop DO Nz = -5,5 z = ZAvg + REAL(Nz)*Dz DO Ny = -5,5 y = YAvg + REAL(Ny)*Dy DO Nx = -5,5 x = XAvg + REAL(Nx)*Dx CALL RANDOM_NUMBER(Zeta) Noise = 2.0*(0.5 - Zeta) D = 19.45 & + 2.52*(x-XAvg) - 0.25*(X-XAvg)**2 & - 4.23*(y-YAvg) + 0.36*(y-YAvg)**2 & + 1.75*(z-ZAvg) - 0.41*(z-ZAvg)**2 & + Noise WRITE (30,2100) x,y,z,D END DO END DO END DO CLOSE (30) 2000 FORMAT ('x,y,z,D') 2100 FORMAT (F14.8,',',F14.8,',',F14.8,',',F14.8)

The program creates a header and then writes 1331 data lines in *CSV* format. Each line contains the position and dose, [*x,y,z,D*]. Here are the initial entries in *multiregress.csv*:

x,y,z,D 18.45000076, 4.32999992, -18.19999886, 22.14448738 19.45000076, 4.32999992, -18.19999886, 28.94458199 20.45000076, 4.32999992, -18.19999886, 38.34004974 ...

Run **R**, clear the console window (*Control-L*) and close any previous script. Copy and paste the following text into the script editor:

rm(list=objects()) setwd("C:/RExamples/MultiRegress") DataSet = read.csv("multiregress.csv",header=T) # Calculate the measurement point XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) # Reference spatial variables to the measurement point DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg # Perform the fitting FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet) summary(FitModel) # Plot a scan along x through the measurement point png(filename="XScan.png") xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) plot(xscan$x,xscan$D) lines(xscan$x,predict(FitModel,newdata=xscan)) dev.off()

The first three lines perform operations that we have previously discussed: clear all **R** objects, set a working directory and read information from the *CSV* file created by the FORTRAN program. The information is stored in data frame *DataSet*. Use *Control-R* in the script editor to execute the command lines. Note that *DataSet* appears in the *Environment* list of **RStudio**. Double-click on **DataSet** to open it in a data-editor tab. Here, you can inspect values and even modify them,

The first task in the analysis is to shift the spatial values so that they are referenced to the measurement point, as in Eq. [1].

XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg

We use the *mean()* function under the assumption that the data points are centered about the measurement position. A single command performs the linear fit:

FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet)

The *summary()* command gives the following information:

Estimate Std. Error Generating (Intercept) 19.540509 0.179142 19.45 x 2.560374 0.025733 2.52 y -4.226011 0.020587 -4.23 z 1.737097 0.028593 1.75 I(x^2) -0.265439 0.009214 -0.25 I(y^2) 0.358165 0.005897 0.36 I(z^2) -0.404630 0.011375 -0.41

As with all statistical calculations, a visual indicator is a quick way to gauge the validity of the fit. The following commands create a two-dimensional plot, a scan of data points and the fitting function along *x* at *y* = 0.0, *z* = 0.0 (through the measurement point). The plot is saved as a *PNG* file in the working directory.

[1] png(filename="XScan.png") [2] xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) [3] plot(xscan$x,xscan$D) [4] lines(xscan$x,predict(FitModel,newdata=xscan)) [5] dev.off()

Line [1] opens a file for *PNG* output, while Line5] flushes the plot information to the file and closes it. Line[2] uses the *subset()* function to create a data frame, *xscan*, that contains only data points where *y* = 0.0 and *z* = 0.0. The first argument is the source data frame and the second argument is the criterion for picking included row. There are two noteworthy features in the command expression:

- The symbol & denotes the logical
*and*operation. - The inclusion criterion has some slack. This is to account to floating point round-off errors. If you inspect
*xscan*with the data editor, you will note that are some values on the order of 10[-8] rather than 0.0

Line [3] plots the raw data points while the *lines()* command in Line 4 applies the method discussed in a previous article to plot the fitting function. Similar operations were applied to create scans through the measurement point along *y* and *z* to produce the plots of Fig. 2.

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.