- In the transport equation approach to the two-dimensional random walk, the idea is to seek average quantities
*n*or**J**and to find relationships between them (like Fick’s first and second laws). These relationships are accurate when there are large numbers of particles. To illustrate the meaning of large, note that the number of electrons in one cubic micrometer of aluminum equals 3 × 10^15. When averages are taken over such large numbers, the transport equations are effectively deterministic. - In the Monte Carlo method, the idea is to follow individual particles based on a knowledge of their interaction mechanisms. A practical computer simulation may involve millions of model particles, orders of magnitude below the actual particle number. Therefore, each model particle represents the average behavior of a large group of actual particles. In contrast to transport equations, the accuracy of Monte Carlo calculations is dominated by statistic variations.

An additional benefit of transport equations is that they often have closed-form solutions that lead to scaling relationships like Eq. 22 of the previous article. We could extract an approximation to the relationship from Monte Carlo results, although at the expense of some labor.

Despite the apparently favorable features of the transport equations, Monte Carlo is the primary tool for electron/photon transport. Let’s understand why. One advantage is apparent comparing the relative effort in the demonstration solutions — the Monte Carlo calculation is much easier to understand. A clear definition of physical properties of particle collisions was combined with a few simple rules. The only derivation required was that for the mean free path. The entire physical model was contained in a few lines of code. In contrast, the transport model required considerable insight and the derivation of several equations. In addition, it was necessary to introduce additional results like the divergence theorem. Most of us feel more comfortable staying close to the physics with a minimum of intervening mathematical constructions. This attitude represents good strategy, not laziness. Less abstraction means less chance for error. A computer calculation that closely adheres to the physics is called a *simulation*. Program managers and funding agents have a warm feeling for simulations.

Beyond the emotional appeal, there is an over-riding practical reason to apply Monte Carlo to electron/photon transport in matter. Transport equations become untenable when the interaction physics becomes complex. For example, consider the following scenario for a demonstration calculation:

In 20% of collisions, a particle splits into two particles with velocity 0.5*v0* and 0.2*v0*. The two particles are emitted at a random angles separated by 60°. Each secondary particle has its own cross section for interaction with the background obstacles.

It would be relatively easy to modify the code of the first article to represent this history and even more complex ones. On the other hand, it would be require consider effort and theoretical insight to modify a transport equation. As a second example, suppose the medium were not uniform but had inclusions with different cross sections and with dimensions less than λ. In this case, the derivation of Fick’s first law is invalid. A much more complex relationship would be needed. Again, it would relatively simple to incorporate such a change in a Monte Carlo model. Although these scenarios may sound arbitrary, they are precisely the type of processes that occur in electron/photon showers.

In summary, the goal in collective physics is to describe behavior of huge numbers of particles. We have discussed two approaches:

**Monte Carlo method**. Define a large but reasonable set of model particles, where each model particle represents the behavior of a group of real particles with similar properties. Propagate the model particles as single particles using known physics and probabilities of interactions. Then, take averages to infer the group behavior.**Transport equation method**. Define macroscopic quantities, averages over particle distributions. Derive and solve differential equations that describe the behavior of the macroscopic quantities.

The choice of method depends on the nature of the particles and their interaction mechanisms. Often, practical calculations usually use a combination of the two approaches. For example, consider the three types of calculations required for the design of X-ray devices (supported in our **Xenos** package):

**Radiation transport in matter**. Photons may be treated with the Monte Carlo technique, but mixed methods are necessary for electrons and positrons. In addition to discrete events (hard interactions) like Compton scattering, energetic electrons in matter undergo small angle scattering and energy loss with a vast number of background electrons (soft interactions). It would be impossible to model each interaction individually. Instead, averages based on transport calculations are used.**Heat transfer**. Here, particles are the energy transferred from one atom to an adjacent one. Because the interaction model is simple and the mean-free-path is extremely small, transport equations are clearly the best choice.**Electric and magnetic fields**. The standard approach is through the Maxwell equations. They are transport type equations, derived by taking averages over a large number of charges. On the other other hand, we employ Monte-Carlo-type methods to treat contributions to fields from high-current electron beams.

**Footnotes**

[1] Use this link for a copy of the full report in PDF format: Monte Carlo method report.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>- Although the density may vary in space, the distribution of particle velocities is the same at all points. Particles all have constant speed
*v0*and there is an isotropic distribution of direction vectors. - There is a uniform-random background density of scattering objects.
- Equation 8 of the previous article gives the probability distribution of
*a*(the distance particles travel between collisions) in terms of the mean-free-path λ.

We want to find how the density changes as particles perform their random walk. Changes occur if, on the average, there is a flow of particles (a *flux*) from one region of space to another. If the density *n* is uniform, the same number of particles flow in one direction as the other, so the average flux is zero. Therefore, we expect that fluxes depend on gradients of the particle density. We can find the dependence using the construction of Fig. 2. Assume that the particle density varies in *x* near a point *x0*. Using a coordinate system with origin at *x0*, the first order density variation is given by Eq. 9. The goal is to find an expression for the number of particles per second passing through the line element Δy. To carry out derivation, we assume the following two conditions:

- The material is homogeneous. Equivalently, λ has the same value everywhere.
- Over scale length λ, relative changes in
*n*are small.

Using polar coordinates shown centered on the line element, consider an element in the plane of area (*r* Δθ)(Δ*r*)$. We want to find how many particles per second originating from this region pass through Δ*y*. We can write the quantity as the product *Jx* Δ*y*, where *Jx* is the linear flow density in units of particles/m-s. On the average, every particle in the calculation volume has the same average number of collisions per second, given by Eq. 10. The rate of scattering events in the area element equals ν times the number of particles in the area (Eq. 11). The fraction of scattered particles aimed at the segment is given by Eq. 12.

Finally, the probability that a particle scattered out of the area element reaches the line element was given in the previous article as exp(-*r*/λ). Combining this expression with Eqs. 10 and 11, we can determine the current density from all elements surrounding the line segment. Taking the density variation in the form of Eq. 13 leads to the expression of Eq. 14. The integral of the first term in brackets equals zero, so that only the term proportional to the density gradient contributes. Carrying out the integrals, the linear current density is given by Eq. 15. The planar *diffusion coefficient* (with units m^2/s) is given by Eq. 16. Generalizing to possible variations in both *x* and *y*, can write Eq. 15 in the form of Eq. 17. This relationship between the vector current density and the gradient of density is called Fick’s first law. Equation 18 lists Fick’s second law, a statement of conservation of particles. In the equation, the quantity ∇•**J** is the divergence of flux from a point and *S* is the source of particles at that point (particles/m^2-s). Equation 18 is the *diffusion equation* for particles in a plane. It states that the density at a point changes in time if there is a divergence of flux or a source or sink.

We are now ready to compare the predictions of the model with the Monte Carlo results of the previous section. Equation 19 gives is solution to the diffusion equation for particles emission from the origin of the plane. The quantity *r* equals √(x^2 + y^2). We can verify Eq. 19 by direct substitution by using the cylindrical form of the divergence and gradient operators and taking *D* as uniform in space. In order to make a comparison with the Monte Carlo calculation, we pick a time value *t0* = *Nc* λ/*v0* and evaluate A based on the condition of Eq. 20. The resulting expression for the density at time *t0* is given by Eq. 21. The prediction of Eq. 21 s plotted as the solid line in Fig. 1. The results from the two methods show close absolute agreement.

Finally, we can determine the theoretical 1/e radius of the particle cloud from Eq. 21 to yield Eq. 22. In a random walk, the particle spread increases as the square root of the number of transits between collisions. For *Nc* = 100, the value is *re*/λ ≅ 14.1.

**Footnotes**

[1] Use this link for a copy of the full report in PDF format: Monte Carlo method report.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>In the Monte Carlo method, the full set of particles is represented by a calculable set of model particles. In this case, each model particle represents a group. We follow detailed histories of model particles as they undergo random events like collisions with atoms. Characteristically, we use a random-number generator with a known probability distribution to determine the outcomes of the events. In the end, the core assumption is that averages over model particles represent the average behavior of the entire group. The alternative to this approach is the derivation and solution of *moment* (or *transport*) equations. The following article covers this technique.

Instead of an abstract discussion, we’ll address a specific example to illustrate the Monte Carlo method. Consider a random walk in a plane. As shown in Fig. 1, particles emerge from a source at the origin with uniform speed *v0*. They move freely over the surface unless they strike an obstacle. The figure represents the obstacles as circles of diameter *w*. The obstacles are distributed randomly and drift about so we can never be sure of their position. The velocity of obstacles is much smaller than *v0*. If a particle strikes an obstacle, we’ll assume it bounces off at a random direction with no change in speed. The obstacles are unaffected by the collisions.

In a few sentences, we have set some important constraints on the physical model:

- The nature of the particles (constant speed
*v0*). - The nature of the obstacles (diameter
*w*, high mass compared to the particles), - The nature of the interaction (elastic collision with isotropic emission from the collision point)

The same type of considerations apply to calculations of radiation transport. The differences are that 1) the model particles have the properties of photons and electrons, 2) the obstacles are the atoms of materials and 3) there are more complex collision models based on experimental data and theory. To continue, we need to firm up the features of the calculation. Let’s assume that 10^10 particles are released at the origin at time *t* = 0. Clearly, there are too many particles to handle on a computer. Instead, we start *Np* = 10,000 model particles and assume that they will give a good idea of the average behavior. In this case, each model particle represent 10^6 real particles. We want to find the approximate distribution of particle positions after they make *Nc* collisions. The logic of a Monte Carlo calculation for this problem is straightforward. The first model particle starts from the origin moving in a random direction. We follow its history through *Nc* collisions and record its final position. We continue with the other *Np* – 1 model particles and then interpret the resulting distribution of final positions.

The source position is *x* = 0, *y* = 0. To find the emission direction, we use a random number generator, a component of all programing languages and spreadsheets. Typically, the generator returns a random number ξ equally likely to occur anywhere over the interval of Eq. 1. Adjusting the range of values to span the range 0 → 2π, the initial unit direction vector is given by Eq. 2.

The particle moves a distance *a* from its initial position and then has it first collision. The question is, how do we determine *a*? It must be a random quantity because we are uncertain how the obstacles are lined up at any time. In this case, we seek the distribution of expectations that the particle has a collision at distance *a*, where the distance may range from 0 to ∞. To answer the question, we’ll make a brief excursion into probability theory.

Let *P(a)* equal the probability that the particle moves a distance *a* without a collision with an object. By convention, a probability value of 0.0 corresponds to an impossible event and 1.0 indicates a certain event. Therefore, *P*(0) = 1.0 (there is no collision if the particle does not move) and P(∞) = 0.0 (a particle traveling an infinite distance must encounter an object). We can calculate *P(a)* from the construction of Figure 2. The probability that a particle reaches *a* + Δ*a* equals the probability that the particle reaches *a* times the probability that it passes through the layer of thickness Δ*a* without a collision. The second quantity equals 1.0 minus the probability of a collision.

To find the probability of a collision in the layer, consider a segment of height *h*. If the average surface density of obstacles is *N* particles/m^2, then the segment is expected to contain *Nh* Δ*a* obstacles. Each obstacle is a circle of diameter *w*. The distance range for an interaction with an obstacle is called the cross-section σ. In this case, we will associate the interaction width with the obstacle diameter, or σ = *w*. The fraction of the height of the segment obscured by obstacles is given by Eq. 3. The exit probability is given by Eq. 4.

A first-order Taylor expansion (Eq. 5) leads to Eq. 6. Equation 6 defines another useful quantity, the *macroscopic cross section* Σ = n σ with dimensions 1/m. Solving Eq. 6 leads to Eq. 7. The new quantity in Eq. 7 is the mean free path, λ. It equals the average value of *a* for the exponential probability distribution. The ideas of cross section, macroscopic cross section and mean-free path are central to particle transport.

We can now solidify our procedure for a Monte Carlo calculation. The first step is to emit a particle at the origin in the direction determined by Eq. 2. Then we move the particle forward a distance *a* consistent with the probability function of Eq. 7. One practical question is, how do we create an exponential distribution with a random number generator that produces only a uniform distribution in the interval of Eq, 1? The plot of the probability distribution of Eq. 7 in Fig. 3 suggests a method. Consider the 10% of particles with collision probabilities between *P*(0.3) and *P*(0.4). The corresponding range of paths extends from *a*(0.6)/λ = -ln(0.4) = 0.9163 to – \ln(0.3) = 1.204. If we assign path lengths from the uniform random variable according to Eq, 8, then we can be assured that on the average 10% will lie in the range *a*/λ = 0.9163 to 1.204. By extension, if we apply the transformation of Eq. 8 to a uniform distribution, the resulting distribution will be exponential. To confirm, the lower section of Fig. 3 shows a random distribution calculation with 5000 particles.

To continue the Monte Carlo procedure, we stop the particle at a collision point a distance *a* from the starting point determined by Eq. 8 and then generate a new random number ξ to determine the new direction according to Eq. 2. Another call to the random-number generator gives a new propagation distance *a* from Eq. 8. The particle is moved to the next collision point. After *Nc* events, we record the final position and start the next particle. The simple programing task with the choice λ = 1 is performed by the following code:

`DO Np=1,NShower`

! Start from center

XOld = 0.0

YOld = 0.0

! Loop over steps

DO Nc=1,NStep

! Random direction

CALL RANDOM_NUMBER(Xi)

Angle = DTwoPi*Xi

! Random length

CALL RANDOM_NUMBER(Xi)

Length = -LOG(Xi)

! Add the vector

X = XOld + Length*COS(Angle)

Y = YOld + Length*SIN(Angle)

XOld = X

YOld = Y

END DO

END DO

Figure 4 shows the results for λ = 1 (equivalently, the plot is scaled in units of mean-free-paths). The left-hand side shows the trajectories of 10 particles for *Nc* = 100 steps. With only a few particles, there are large statistical variations, making the distribution in angle skewed. We expect that the distribution will become more uniform as the number of particles increases because there is no preferred emission direction. The right-hand side is a plot of final positions for *Np* = 10,000 particles. The distribution is relatively symmetric, clustered within roughly 15 mean-free-parts of the origin. In comparison, the average total distance traversed by each particle is 100.

Beyond the visual indication of Fig. 4, we want quantitative information about how far particles move from the axis. To determine density as a function of radius, we divide the occupied region into radial shells of thickness Δ*r* and count the number of final particle positions in each shell and divide by the area of the shell. Figure 5 shows the results. The circles indicate the relative density of particles in shells of width 0.8λ. Such a plot is called a *histogram* and the individual shells (containers) are called *bins*. Histograms are one of the primary methods of displaying Monte Carlo results. Note that the points follow a smooth variation at large radius, but that they have noticeable statistical variations at small radius. The reason is that the shells near the origin have smaller area, and therefore contain fewer particles to contribute to the average. Statistical variations are the prime concern for the accuracy of Monte Carlo calculations.

**Footnotes**

[1] Use this link for a copy of the full report in PDF format: Monte Carlo method report.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>

The utility collection started with the FP Universal Scale. It grew out of my frustration with conventional [...]]]>

The utility collection started with the **FP Universal Scale**. It grew out of my frustration with conventional screen rulers which were either rigidly referenced to screen pixels or absolute units like inches or centimeters. A more useful approach is to reference the ruler to the units of the graph or photograph to be measured. Accordingly, after several years of thought I set out to create an on-screen version of the much-loved Gerber Variable Scale. The implementation involved intensive interactions with the Windows API, so I decided to use **RealBasic** with purchased plugins to handle screen overlays. During development, the program expanded from a simple screen ruler to a complete screen digitization system for scientists and engineers.

There were three motivations for the next utility, the **FP File Organizer**:

- In comparison to sophisticated two-window file managers like
**Free Commander**, I wanted a simple, clean interface that supported the functions I used every work day. - Our technical programs involve extended file organization. In discussing file management in tutorials, I wanted a standard reference environment.
- I needed a general file-manager unit for my MIDI programs.

**FP File Organizer** has several nice features like fast file searches, full path copy to the clipboard, definable tools, special folders and desktop shortcut creation. I use the program for all my work except for multi GB file transfers. For these, I use xcopy or robocopy.

The **Cecil_B** program converts an organized set of BMP files into an AVI movie. I developed it in response to a customer request to make animations of solutions in time-domain programs like **TDiff** and **HeatWave**. I created the final two utilities, **Computer Task Organizer** (**CTO**) and **Boilerplate** to reduce frustrations I noticed over the last 30 years using Windows. With regard to **CTO**, I found that most of my work day involved running the same programs with the same documents or going to the same website repetitively. The program reduces the 100 tasks that I perform every day to single button clicks.

The new utility **Boilerplate** (Figure 1) expands the functions of the Windows clipboard in two ways:

- You can build a library of standard text selections (
*i.e.*, boilerplate) that can be transferred to the clipboard with a single button press — ready to paste into a document. - You can recall items previously on the clipboard.

The second feature deals with an irritating limit of the clipboard — it stores only one item at a time. **Boilerplate** keeps a running record of the last twenty clipboard texts — they can be recalled to the clipboard with a single button click. I got the idea from the old utility **Clipboard Magik**. The program had a lot of potential, but was difficult to utilize in practice.

**Footnotes**

[1] Contact us : techinfo@fieldp.com.

[2] Field Precision home page: www.fieldp.com.

]]>

The complicating factor is that the inner workings of the MRI magnet are proprietary, so my contact would not know the geometry of the drive currents and iron poles that generate the field. He would have to base his shielding calculations entirely on fringing field patterns supplied by his customer (like that of Fig. 1). Everything inside the inner line would be a mystery. My contact was being pursued by a sales rep for a well-known alternative to **Magnum**. I won’t name names, but for the sake of discussion let’s refer to the program as Lucia di Lammermoor (LDL). The sales rep felt he had the perfect solution. On top of the high price of LDL, my contact could buy a special inverse-solution add-on that would determine the unknown magnet configuration from the fringing-field pattern. There are two drawbacks to this approach:

- James Clerk Maxwell says it’s impossible.
- It’s totally unnecessary.

The critical insight is that the fringing fields of any solenoid assembly (no matter how complex) approach those of a simple magnetic dipole in the region outside the assembly. This tutorial reviews the theory:

Magnetic Dipole Moment of a Coil Assembly

To emphasize the point, Figure 2 shows contours of |**B**| calculated by **Magnum** for a current loop of radius *R* = 0.5 m carrying current *I* = 1000.0 A (the plot plane includes the magnet *z* axis). The line shapes clearly resemble those of Fig. 1. Complicating the comparison is the fact that the lines and intervals of Fig. 1 (supplied by my contact’s customer) are physically impossible. There are three possible explanations:

- LDL gave the wrong answer.
- The LDL user at the magnet manufacturer did not pay attention to the potentially large effects of computation boundaries on the weak fringing fields.
- The magnet manufacturer was reluctant to send actual data to my contact, so they had a draftsman create them.

Supposing that some day my contact receives a PDF document with real data, here’s how the analysis would proceed.

1) Use the **Universal Scale** (shown in Fig. 1) to measure the locations (*zi*) of contours |*Bi*| along the *z *axis and an axis normal to *z* passing through the magnet center (*xi*).

2) Calculate estimates of the magnet dipole moment from the equations

*mi* = *Bi***zi*^3/(μ0/2*π), *mi* = 2**Bi***xi*^3/(μ0/2*π).

For a valid field distribution, all of the estimated values should be close, giving an average value *m*.

3) Set up a **Magnum** solution volume (large compared to the diagnostic room to minimize boundary effects). For the applied field, define a circular coil normal to *z* at the origin with a radius *R* comparable to that of the magnet assembly. Assign the coil current according to *I* = *m*/(π**R*^2) to replicate the fringing field pattern.

4) Add shielding walls as needed, and run a standard **Magnum** calculation. Analyze the 5 gauss contour to make sure it is everywhere within the diagnostic room.

In the calculation, scaling relationships may be applied to deal with thin sheets of iron following the discussion in this tutorial:

To facilitate the application, we have added two features to **Magnum**:

- We have modified the contour-line plot routines in
**MagView**to enable users to enter a set of specific values (e.g., the 5 gauss limit). - We have doubled the number of entries in the library of soft magnetic materials supplied with the code to include common shielding materials like M36.

Magnetic material data are available at http://www.fieldp.com/magneticproperties.html.

**Footnotes**

[1] You can get information on Magnum at http://www.fieldp.com/magnum.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>Let’s start by creating some data to work with. We’ll model the kinetic energy distribution of positrons emitted in the β decay of Na22. The maximum value is 540 keV. To approximate the probability mass density, we’ll use one half cycle of a sine function skewed toward higher energies to represent Coulomb repulsion from the product nucleus. Copy and paste the following commands to the **RStudio** script editor:

rm(list=objects()) xmax = 540.0 nmax = 50 MassDens = function(x) { Out = sin(x*pi/(xmax))*(0.35+0.4*x/xmax) return(Out) } xval = c(seq(from=0.0,to=xmax,length.out=(nmax+1))) mdens = numeric(length=(nmax+1)) for (n in 1:(nmax+1)) { mdens[n] = MassDens(xval[n]) } plot(xval,mdens) curve(MassDens,0.0,xmax,xname="x",add=TRUE)

The commands

MassDens = function(x) { Out = sin(x*pi/(xmax))*(0.35+0.4*x/xmax) return(Out) }

define a function *MassDens* that follows the curve of Figure 1. The command

xval = c(seq(from=0.0,to=xmax,length.out=(nmax+1)))

creates a vector of 51 points equally spaced along the energy axis from 0.0 to 540.0 keV. The commands

mdens = numeric(length=(nmax+1)) for (n in 1:(nmax+1)) { mdens[n] = MassDens(xval[n]) }

create an empty vector to hold the probability mass density *p(x)* and fill the values with a loop that uses the *MassDens* function. At this point, the relative probability is sufficient — it is not necessary to worry about normalization.

To assign energy values for a particle distribution, we need the cumulative probability distribution, defined as

*P(x)* = ∫ *p(x’)dx’*

from *xmin* to *xmax*. The function *P(x)* equals the probability that the energy is less than or equal to *x*. It has a value of 0.0 at *xmin* and 1.0 at *xmax*. The following commands use the trapezoidal rule to perform the integral:

dx = xmax/nmax cdens = numeric(length=(nmax+1)) cdens[1] = 0.0 for (n in 2:(nmax+1)) { cdens[n] = cdens[n-1] + (mdens[n-1]+mdens[n])*dx/2.0 } cdens = cdens/cdens[51]

Note that the final command normalizes the function. The result is a vector *cdens* of 51 points to represent the cumulative distribution.

The cumulative probability *P(x)* is a monotonically-increasing function of *x* — there is a unique value of *x* for every value of *P(x)*. Therefore, we can view *x* as a function of *P(x)*, as illustrated by Fig. 2 created with the command:

plot(cdens,xval)

We can also make an accurate calculation of *x* for any *P(x)* using the spline interpolation functions of **R**. The line in Fig. 2 was created with the command:

lines(spline(cdens,xval))

Figure 3 illustrates the principle underlying of the sampling method. Suppose we want to create 10,000 particles. By the definition of the cumulative probability distribution, a total of 1000 of the particles should be contained within the interval 0.6 ≤ *P* ≤ 0.7. The graph show that these particles should be assigned energies in the range 320 keV ≤ *x* ≤ 360 keV. In other words, if ζ represents a uniform sequence of 10,000 numbers from 0.0 to 1.0, then the desired distribution will result if the energy values are assigned according to

*x*[*n*] = Inv*P*(ζ[*n*])

The **R** expression for the assignment operation uses a form of the *spline()* function that creates values at specified points:

NMax = 10000 dpoints = numeric(length=NMax) zetavals=c(seq(from=0.0,to=1.0,length.out=NMax)) ztemp = spline(cdens,xval,xout=zetavals) dpoints = ztemp$y hist(dpoints,breaks=35)

In this case, the *spline()* function returns a data frame containing both the independent and dependent values. The assigned energies are set equal to the dependent values, *dpoints = ztemp$y*. The top graph in Figure 4 shows a histogram of the resulting distribution.

The routine creates the same distribution each time the script is run. In some circumstances, you may want to add variations so that runs are statistically independent. In this case, the uniform sequence of ζ values may be replaced with a random-uniform distribution:

zetavals=runif(NMax,0.0,1.0)

The lower graph in Fig. 4 shows the result.

In conclusion, the **R** package has a vast set of available commands and options that could occupy several textbooks. In this tutorial, I’ve tried to cover the fundamental core. My goal has been to clarify the sometimes arcane syntax of R so you have the background to explore additional functions. A compendium of scripts and data files for the examples is available. To make a request, please contact us.

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>

The parameters of primary particles for Monte Carlo simulations in **GamBet** are specified in source (*SRC*) files. Output escape files have the same format, so the output from one **GamBet** calculation can be used as the input to a subsequent one. For initial **GamBet** simulations, a source file representing specific distributions in position, velocity and energy in usually prepared. Although the **GenDist** utility can create several useful distributions, the possibilities with **R** are greatly expanded.

As a test case, we’ll generate an input electron beam for a 3D calculation with current *CurrTotal* = 2.5 A and average energy *E0* = 20.0 keV. The beam has a circular cross section with a Gaussian distribution in *x* and *y* of width *Xw* = *Yw* = 1.4 mm. The average position is [*X0,Y0,Z0*] = [0.0 mm, 0.0 mm, 0.0 mm). The parallel electrons move in *z*. There is a Gaussian energy spread of *Ew* = 500 eV.

To begin, we set up a script that can be tested in the interactive environment of **RStudio** and then see how to convert it to an autonomous program that can be run from a batch file. The first set of commands clears the workspace, sets a working directory and defines parameters:

rm(list=objects()) WorkDir = "C:/USINGRFORGAMBETSTATISTICALANALYSIS/Examples/Section09/" setwd(WorkDir) CurrTotal = 2.5 X0 = 0.0 Y0 = 0.0 Z0 = 0.0 Xw = 1.4 Yw = 1.4 E0 = 20000.0 Ew = 500.0 Ux0 = 0.0 Uy0 = 0.0 Uz0 = 1.0 NPart= 100000

The previous article discussed the *SRC* file format. The strategy will be to set up a vector of length *NPart* for each quantity in the file data lines, combine them into a data frame and then use the *write.table()* command to create the file. These commands create the vector for the *Type* column:

Type = character(length=NPart) for (n in 1:NPart) { Type[n] = "E" }

The first command creates a character vector called *Type* with *NPart* blank entries. The loop fills the vector with the character *E*.

The quantities *Z, Ux, Uy*, and *Uz* are numerical vectors of length *NPart* that contain identical values. For this, it is convenient to create and to fill the vectors with the *seq()* and *c()* commands:

Z = c(seq(from=Z0,to=Z0,length.out=NPart)) Ux = c(seq(from=Ux0,to=Ux0,length.out=NPart)) Uy = c(seq(from=Uy0,to=Uy0,length.out=NPart)) Uz = c(seq(from=Uz0,to=Uz0,length.out=NPart))

The position vectors *X* and *Y* and the *Energy* vector are created using the *rnorm()* function. It generates *NPart* values following a specified normal distribution:

X = rnorm(NPart,mean=X0,sd=Xw) Y = rnorm(NPart,mean=Y0,sd=Yw) Energy = rnorm(NPart,mean=E0,sd=Ew)

The set of vectors is assembled into a data frame. Note that the column names are the same as the vector names, *Type*, *Energy*, *X*, …:

SRCRaw = data.frame(Type,Energy,X,Y,Z,Ux,Uy,Uz)

We must take some precautions. With the normal distribution, there is a very small but non-zero probability of extreme values of the position and energy. They could result in electrons outside the **GamBet** solution volume or negative energy values. Both conditions would lead to a program error. We create a subset of the raw data such that no electron has *r* > 7.0 mm or *Energy* ≤ 0.0:

SRCFile = subset(SRCRaw,((X^2+Y^2)<=25.0*Xw^2) & (Energy > 0.0))

We need to add the current per electron to complete the *SRCFile* data frame. Note the use of *NLength*, the number of electrons in the modified data frame, which may be less than *NPart*:

NLength = length(SRCFile$Type) dCurr = CurrTotal/NLength

A column vector of identical current values is constructed and appended to the *SRCFile* data frame with the *cbind()* command:

Curr = c(seq(from=dCurr,to=dCurr,length.out=NLength)) SRCFile = cbind(SRCFile,Curr)

We also define two plotting vectors for latter use:

RVector = sqrt(SRCFile$X^2 + SRCFile$Y^2) EVector = SRCFile$Energy

We’re ready to write the file. Some variables are set up in preparation:

FNameOut = "TestSRCGeneration.SRC" HLine1 = "* GamBet Particle Escape File (Field Precision)" HLine2 = paste("* NPart:",NLength) HLine3 = "* Type Energy X Y Z Ux Uy Uz Curr" HLine4 = "* =========================================================================================================="

Note that the actual number of electrons was written to *HLine2*. The following commands open the file (over-writing any previous version) and write the header using the *cat()* command:

cat(HLine1,file=FNameOut,append=FALSE,fill=TRUE) cat(HLine2,file=FNameOut,append=TRUE,fill=TRUE) cat(HLine3,file=FNameOut,append=TRUE,fill=TRUE) cat(HLine4,file=FNameOut,append=TRUE,fill=TRUE)

We could add the table in its current form, but it would be nice to have the fixed-width format illustrated in the previous article. This command adds three initial spaces to the *Type* column:

SRCFile$Type = paste(" ",SRCFile$Type,sep="")

These commands convert the numbers in the columns to character representations with width 12 in either scientific or standard notation:

SRCFile$Energy = format(SRCFile$Energy,scientific=TRUE,digits=5,width=12) SRCFile$X = format(SRCFile$X,scientific=TRUE,digits=5,width=12) SRCFile$Y = format(SRCFile$Y,scientific=TRUE,digits=5,width=12) SRCFile$Z = format(SRCFile$Z,scientific=TRUE,digits=5,width=12) SRCFile$Ux = format(SRCFile$Ux,scientific=FALSE,digits=6,width=12) SRCFile$Uy = format(SRCFile$Uy,scientific=FALSE,digits=6,width=12) SRCFile$Uz = format(SRCFile$Uz,scientific=FALSE,digits=6,width=12) SRCFile$Curr = format(SRCFile$Curr,scientific=TRUE,digits=6,width=12)

Note that *RVector* and *EVector* were defined when the data entries were still numbers. Finally, this command writes the *SRC* file:

write.table(SRCFile,file=FNameOut,sep=" ",append=TRUE,col.names=FALSE,quote=FALSE,row.names=FALSE)

Here is a sample of the result:

* GamBet Particle Escape File (Field Precision) * NPart: 100000 * Type Energy X Y Z Ux Uy Uz Curr * ========================================================================================================== E 2.0448e+04 -6.8232e-01 8.4354e-01 0e+00 0 0 1 2.5e-05 E 1.9949e+04 -4.8475e-02 1.5114e+00 0e+00 0 0 1 2.5e-05 E 2.1422e+04 -6.9644e-01 -4.9890e-01 0e+00 0 0 1 2.5e-05 E 1.9291e+04 -1.9870e+00 5.5780e-01 0e+00 0 0 1 2.5e-05 E 2.0032e+04 -5.6534e-01 -2.0669e-01 0e+00 0 0 1 2.5e-05 E 2.0752e+04 1.2376e+00 7.0758e-01 0e+00 0 0 1 2.5e-05 ...

The format isn’t precisely the way we would like. For example, despite the setting *digits*=6, the entries in *Ux* are 0 rather than 0.00000. This is a quirk of **R**. The program will not write more significant figures than the defining values, no matter what you tell it. Perhaps there is a solution, but I haven’t found it. **GamBet** and **GenDist** will recognize the above form. Figure 1 shows a **GenDist** scatter plot of model particles in the *x-y* plane.

The following commands create histograms to display the distribution of electrons in radius and energy (Figure 2):

hist(RVector,breaks=100) hist(EVector,breaks=100)

To conclude, we’ll modify the script so it may be called from a batch file. Suppose we want to make a series of GamBet runs with beams of differing width. For each run, we regenerate the input *SRC* file with R to represent the new width, run **GamBet** and rename critical output files so they are not over-written. Here’s a sample of a **Windows** batch file showing the first operation

START /WAIT RScript Sect09DemoScript.R "C:\USINGRFORGAMBETSTATISTICALANALYSIS\Examples\Section09\" "1.4" START /WAIT C:\fieldp_basic\gambet\gambet BatchDemo RENAME BatchDemo.GLS BatchDemo01.GLS RENAME BatchDemo.G3D BatchDemo01.G3D ...

The */WAIT* option in the *START* command ensures that the *SRC* file is available before starting **GamBet** and that there will not be a file access conflict when **GamBet** is running. Consider the first command. **RScript** is a special form of **R** to run scripts. To avoid typing the full path to program every time, I modified the Windows path to include *C:\Program Files\R\R-3.2.0\bin\*. The next command-line parameter is the name of the script followed by two character values that are passed to the script. The first is the working directory for the data and second is the value of *Xw*.

Here’s how the script we have been discussing is modified to act as an autonomous program. The first set of commands becomes:

rm(list=objects()) args = commandArgs() WorkDir = args[6] setwd(WorkDir)

The second command stores the command line parameters in a character vector *args*. You may wonder, why start with *args[6]*? A listing of *args* gives the following information:

[1] C:\PROGRA~1\R\R-32~1.0\bin\i386\Rterm.exe [2] --slave [3] --no-restore [4] --file=Sect09DemoScript.R [5] --args [6] C:\USINGRFORGAMBETSTATISTICALANALYSIS\Examples\Section09\ [7] 1.4

The first argument is the name of the running program, a typical convention in **Windows**. The next four are the values of options set in **RTerm**. The values of interest start at the sixth component. The next change is in the initialization section:

Xw = as.numeric(args[7])

Because command line parameters are strings, we force the value to be interpreted as a number to define *Xw*. Finally, suppose we want to inspect the histograms, even though the program is running in the background. The ending statements are modified to:

pdf(file="Test.pdf") hist(RVector,breaks=100) hist(EVector,breaks=100) dev.off()

The results of all plot commands between *pdf()* and *dev.off()* are sent to the specified PDF file where they can be viewed after the run.

In conclusion, the **R** package has a vast set of available commands and options that could occupy several textbooks. In this set of short articles, I’ve tried to cover the fundamental core. One of my goals has been to clarify the sometimes arcane syntax of **R** so you have the background to explore additional functions. A book form of the articles with an index will be available soon on our website.

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>**GamBet**escape files have a name of the form*FName.SRC*. They record the parameters of electrons, photons and positrons that escape from the solution volume. An example is the distribution of bremsstrahlung photons produced by an electron beam striking a target. Escape files may serve as the particle source for a subsequent simulation. You can also use**R**to prepare source files with mathematically-specified distributions, the topic of the next article.- Files of the spatial distribution of deposited dose produced the
*MATRIX*commands in**GBView2**and**GBView3**.

We’ll begin with a discussion of the **GamBet** SRC file. Here’s an example, the output from a bremsstrahlung target:

* GamBet Particle Escape File (Field Precision) * Output from run: BremGen * DUnit: 1.0000E+03 * NPrimary: 1 * NShower: 500 * * Type Energy X Y Z ux uy uz curr/flux * ==================================================================================================== E 1.8276E+07 1.0000E+00 -2.2093E-01 -9.5704E-03 0.95867 -0.28441 -0.00726 P 1.0476E+06 1.0000E+00 -2.2099E-01 -9.6599E-03 0.96046 -0.27838 0.00421 P 1.4681E+07 1.0000E+00 -2.2079E-01 -9.4444E-03 0.96751 -0.24988 0.03846 P 1.1157E+05 1.0000E+00 -2.2125E-01 -9.0243E-03 0.93991 -0.33167 0.08099 ... P 3.7167E+06 1.0000E+00 -1.1189E-01 -6.2985E-02 0.99157 -0.11335 -0.06279 P 8.2996E+05 1.0000E+00 3.3150E-01 -2.6303E-01 0.91975 0.30757 -0.24386 ENDFILE

There is a file header consisting of eight comment lines marked by an asterisk. The header is followed by a large number of data lines. The file terminates with an *ENDFILE* marker. Each data line contains 8 or 9 entries separated by space delimiters. A data line contains the following components:

- The marker in the first column gives the type of particle, electron (
*E*or*E-*), photon (*P*) and positron (*E+*). In the example, the output is a mixture of the primary electron beam and the secondary photons. - The kinetic energy in eV.
- The position at the exit, (
*x,y,z*). - Components of a unit vector giving the particle direction (
*ux,uy,uz*).

The ninth column is present only in runs where flux weighting is assigned to the model input particles.

If we are going to perform a standard analysis on many files, it would be an advantage to create an **R** script where the user could choose the working directory and the *SRC* file interactively. Here is the section of the script to specify and to load a file. It introduces several new concepts and commands. Copy and paste the text to a script file window in **RStudio**:

library(utils) if (!exists("WorkDir")) { WorkDir = choose.dir(default = "", caption = "Select folder") } setwd(WorkDir) FDefault = paste(WorkDir,"\\*.src",sep="") FName = choose.files(default=FDefault,caption="Select GamBet SRC file",multi=FALSE) CheckFile = read.table(FName,header=FALSE,sep="",comment.char="*",fill=TRUE,nrows=1) NColumn = ncol(CheckFile) if(NColumn==8) { # Standard column names cnames = c("Type","Energy","X","Y","Z","ux","uy","uz") } else { cnames = c("Type","Energy","X","Y","Z","ux","uy","uz","Flux") } SRCFile = read.table(FName,header=FALSE,sep="",comment.char="*",col.names=cnames,fill=TRUE) NLength = nrow(SRCFile) SRCFile = SRCFile[1:(NLength-1),]

The first command

library(utils)

occurs often in work with **R**. Many default commands are loaded when you start the **R** console. Although they have been sufficient for our previous work, they represent only a fraction of the available features. In this case, we load a library *utils* that supports interactive file operations. The command lines constitute an if statement:

if (!exists("WorkDir")) { WorkDir = choose.dir(default = "", caption = "Select folder") }

Simple *if* statements consist of a conditional line followed by any number of commands in braces. In this case, the commands are executed only if the object *WorkDir* does not exist (! designates the logical not operation). If we had a number of *SRC* files to analyze in the same directory, we would not want to reset the working directory every time. The *choose.dir()* operation brings up the standard Windows selection dialog of Fig. 1 and returns the path as *WorkDir*. The path is then set as the working directory.

After picking a directory, we’ll use the *choose.files()* command to pick a file. One of the command parameters is a default file name. We again use the paste operation to concatenate the path name and the default file name:

FDefault = paste(WorkDir,"\\*.src",sep="")

The result looks like this:

FDefault = "C:\\USINGRFORGAMBETSTATISTICALANALYSIS\\Examples\\Section08\\*.src"

The double backslash is synonymous with the forward slash used in **R**. The command:

FName = choose.files(default=FDefault,caption="Select GamBet SRC file",multi=FALSE)

opens a standard dialog to return the name of a a single file in the working directory of type **.SRC* (Figure 2). The example file contains 47,892 entries.

The *read.table()* command provides a simple option for reading the file, but there are two challenges:

- We don’t know whether there will be 8 or 9 data columns.
- The line with
*ENDFILE*does not contain any data.

These commands address the first problem:

CheckFile = read.table(FName,header=FALSE,sep="",comment.char="*",fill=TRUE,nrows=1) NColumn = ncol(CheckFile) if(NColumn==8) { cnames = c("Type","Energy","X","Y","Z","ux","uy","uz") } else { cnames = c("Type","Energy","X","Y","Z","ux","uy","uz","Flux") }

The *read.table()* command ignores the header comment lines and reads a single data line (*nrows*=1) to the dummy data frame *CheckFile*. The *ncol ()* operator returns the number of data columns as *NColumn*. Depending on the value, we define a vector *cnames* of column names containing 8 or 9 components. The next *read.table()* command inputs the entire set of data lines plus the *ENDFILE* line:

SRCFile = read.table(FName,header=FALSE,sep="",comment.char="*",col.names=cnames,fill=TRUE)

Note that the number of columns and their names is set by *col.names=cnames*. In the absence of the *fill=TRUE* option, the operation would terminate with an error because the *ENDFILE* line has only one entry. The option specifies that a line with fewer entries than specified should be filled out with N/A values. The last line is meaningless, so we delete it with the commands:

NLength = nrow(SRCFile) SRCFile = SRCFile[1:(NLength-1),]

Clearly, the procedure for loading a **GamBet** source file has several tricky features. Discovering them takes some trial-and-error. The advantage of a script program like **R** is that the effort is required only once. The script we have discussed provides a template to load all *SRC* files.

Now that the file has been loaded as the data frame *SRCFile*, we can do some calculations. Copy and paste the following information below the load commands:

electrons = subset(SRCFile,Type=="E" | Type=="E-") photons = subset(SRCFile,Type=="P") elecavg = mean(electrons$Energy) photavg = mean(photons$Energy) hist(electrons$Energy,breaks=20,density=15,main="Electron energy spectrum",xlab="T (eV)",ylab="N/bin") hist(photons$Energy,breaks=20,density=15,main="Photon energy spectrum",xlab="T (eV)",ylab="N/bin")

This command:

electrons = subset(SRCFile,Type=="E" | Type=="E-")

creates a data frame *electrons* that contains only rows where the *Type* is *E* or *E-*. We calculate the mean kinetic energy of electrons emerging from the target and create a histogram. Figure 3 shows the result. At this point, you should be able to figure out the meanings of options in the* hist()* command. Use the *Help* tab in **RStudio** and type in *hist* for more detailed information.

To conclude, we’ll briefly discuss importing a matrix file from the two-dimensional **GBView2** post-processor. A matrix file is a set of values computed over a regular grid (uniform intervals in *x* and *y* or *z* and *r*). In this case, the quantity is total dose (deposited energy/mass), dose from primary electrons, dose from primary photons, etc. Here is a sample of the first part of a file:

Matrix of values from data file alumbeam.G2D XMin: 0.0000E+00 YMin: -5.0000E-02 XMax: 1.0000E-01 YMax: 5.0000E-02 NX: 20 NY: 20 X Y NReg DoseTotal DoseElecP DosePhotP DosePosiP DoseElecS DosePhotS DosePosiS ========================================= 0.0000E+00 -5.0000E-02 1 3.3101E+04 2.6251E+04 0.0000E+00 0.0000E+00 6.8504E+03 0.0000E+00 0.0000E+00 5.0000E-03 -5.0000E-02 1 3.5977E+05 3.0285E+05 0.0000E+00 0.0000E+00 5.6925E+04 0.0000E+00 0.0000E+00 1.0000E-02 -5.0000E-02 1 3.5977E+05 3.0285E+05 0.0000E+00 0.0000E+00 5.6925E+04 0.0000E+00 0.0000E+00 1.5000E-02 -5.0000E-02 1 3.9705E+05 3.1705E+05 0.0000E+00 0.0000E+00 8.0007E+04 0.0000E+00 0.0000E+00 2.0000E-02 -5.0000E-02 1 3.9705E+05 3.1705E+05 0.0000E+00 0.0000E+00 8.0007E+04 0.0000E+00 0.0000E+00 2.5000E-02 -5.0000E-02 1 3.8548E+05 3.1449E+05 0.0000E+00 0.0000E+00 7.0986E+04 0.0000E+00 0.0000E+00 ...

The data lines are space delimited and can easily be loaded with the *read.table()* command. The challenge is the header, where the lines are not comments. Because the header information is not required for the **R** analysis, we can simply omit it by using the *skip* option. The following code loads matrix file information, limits data to a scan along *x* at *y* = 0.0, carries out a fourth-order polynomial fit and plots the results (Figure 4).

cnames = c("x","y","NReg","DRate","DoseElecP","DosePhotP","DosePosiP","DoseElecS","DosePhotS","DosePosiS") MatrixData = read.table(file="Demo.MTX",header=FALSE,sep="",col.names=cnames,skip=6) AxisPlot = subset(MatrixData,y > -0.00001 & y < 0.00001) plot(AxisPlot$x,AxisPlot$DRate) DFit = lm(DRate~I(x)+I(x^2)+I(x^3)+I(x^4),AxisPlot) PlotSeq = seq(from=0.0,to=0.10,length.out=101) PlotPos = data.frame(x=PlotSeq) lines(PlotPos$x,predict(DFit,newdata=PlotPos))

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] The *read.fortran()* command of **R** provides a path to import information from the binary output files of any Field Precision technical program.

[3] Contact us : techinfo@fieldp.com.

[4] Field Precision home page: www.fieldp.com.

]]>

Figure 1 illustrates the test calculation. Given a noisy signal, we want to determine if there is a resonance hidden within it. Such a response usually has the form of a Gaussian function

*A exp[-(x-mu)^2/(2.0*sigma^2) *[1]

We want to determine values of *A*, *mu* and *sigma* (the resonance width) such that the function gives the most probable fit to the data of Fig. 1. We also want to determine confidence levels for the fit. This is clearly a nonlinear problem because the parameters *mu* and *sigma* are integrated in the function.

The spreadsheet of Fig. 2 is used to generate data over the range 0.0 < *x* < 20.0. Values of *y* are determined by Eq. 1 with *A* = 20.0, *mu* = 8.58 and *sigma* = 1.06. The noise level is set as an adjustable parameter. Rather than save the spreadsheet as a *CSV* file, we copy the data section as shown and past into a text file file, *peakdata.tab*. In this case, the entries on a line are separated by tab characters instead of commas. Therefore, we use a different command form to read the file:

PeakData = read.table("peakdata.tab",header=T,sep="")

The header sets the column names of the data frame *PeakData* to *y* and *x*. The option *sep=””* means that any white-space characters (including tabs) act as delimiters.

To run the example, copy and paste this text into the **R** script editor:

PeakData = read.table("peakdata_high.tab",header=T,sep="") fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2))),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0)) summary(fit) plot(PeakData$x,PeakData$y) new = data.frame(x = seq(min(PeakData$x),max(PeakData$x),len=200)) lines(new$x,predict(fit,newdata=new)) confint(fit,level=0.95)

The critical command line for the nonlinear fit is

fit = nls(y ~ I(Amp*exp(-1.0*(Mu-x)^2/Sigma^2)),data=PeakData, start=list(Amp=5.0,Mu=5.0,Sigma=5.0))

The first argument of the *nls()* command is a defining formula:

y ~ I(Amp*exp(-1.0*(Mu-x)^2/(2.0*Sigma^2)))

It implies that the variable *y* depends on the independent variable *x* as well as parameters named *Amp*, *Mu* and *Sigma*. The dependence is declared explicitly and enclosed in the *I()* function to be safe. The second argument, *data=PeakData*, defines the input data source. The third argument,

start=list(Amp=5.0,Mu=5.0,Sigma=5.0)

needs some explanation. The iterative calculation involves varying parameter values and checking whether the overall error is reduced. The user must set reasonable initial values for the search. The start option requires an **R** list, a general set of values. The *list()* operator combines several objects into a list. In the form shown, the first item in the list has name *Amp* and value 5.0. Be aware that for some choices of parameters, the process may not converge or may converge to an alternate solution.

The *summary()* command produces a console listing like this:

Formula: y ~ I(Amp * exp(-1 * (Mu - x)^2/(2 * Sigma^2))) Parameters: Estimate Std. Error t value Pr(>|t|) Amp 22.45079 1.86245 12.05 1.49e-14 *** Mu 8.63087 0.08361 103.22 < 2e-16 *** Sigma -0.87288 0.08361 -10.44 1.02e-12 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.675 on 38 degrees of freedom Number of iterations to convergence: 13 Achieved convergence tolerance: 5.277e-06

The next three commands plot the raw data and the fitting line following methods that we discussed in previous articles. Finally, there is a new command that gives the confidence interval:

confint(fit,level=0.95)

which produces a console listing like this:

2.5% 97.5% Amp 18.738008 26.3724502 Mu 8.460237 8.8129274 Sigma -1.071273 -0.7146478

The results have the following interpretation. What is the range where we can have 95% confidence that the parameter value lies within it. In other word, for the above example we can have 95% confidence that the peak amplitude is between 18.738008 and 26.3724502.

With an understanding of the mechanics of the calculation, we can check out some results. To begin, a high level of noise in the spreadsheet (*Noise* = 5.0) leads to the data points and fitting line of Fig. 3. Here is a summary of numerical results:

Noise level: 5.0 Estimate Std. Error 2.5% 97.5% Generating Amp 22.45079 1.86245 18.738008 26.3724502 20.0 Mu 8.63087 0.08361 8.460237 8.8129274 8.5 Sigma -0.87288 0.08361 -1.071273 -0.7146478 1.06

The negative sign for *Sigma* is not an issue because the parameter always appears as a square. Because the peak is barely discernible in the noise, there is some error in the estimate of *Sigma*. Rerunning the example with reduced noise level (*Noise* = 0.5) provides a check that the *nls()* command is operating correctly. Figure 4 shows the data points and fitting line. The results are much closer to the ideal values with no signal noise.

Noise level: 0.5 Estimate Std. Error 2.5% 97.5% Amp 20.02893 0.17535 19.673012 20.386997 Mu 8.57904 0.01084 8.557082 8.600996 Sigma 1.07258 0.01084 1.050487 1.095135

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>**RStudio**remembers your screen setups, so it is not necessary to resize and reposition floating windows in every session.- The script editor (upper left in Fig. 1) features syntax highlighting and an integrated debugger. You can keep multiple tabbed documents active.
- An additional area at upper-right shows all
**R**objects current defined. Double clicking a data frame opens it in a data editor tab in the script window. - The area at the lower right has multiple functions in addition to plot displays. Other tabs invoke a file manager, a package manager for
**R**supplements and a*Help*page for quick access to information on**R**commands.

I’ll assume you’re using **RStudio** for the example of this article and following ones.

This article shows how to use **R** to fit a three-dimensional polynomial function to noisy data. A practical example is making an accurate interpolation of radiation dose at a point , *D(x,y,z)*, in a **GamBet** calculation. Here, the dose in any single element is subject to statistic variations that depend on the number of model input particles. The idea is to get an estimate at a test point [x0,y0,z0] by finding the most probable smooth function using values from a large number of nearby elements. The approach is valid when the physics implies that the quantity varies smoothly in space (*e.g*, the variation of potential in electrostatics). The present calculation is the three-dimensional extension of the interpolation discussed in a previous article.

As with previous examples, we’ll need to create some test data. The measurement point is [*XAvg* = 23.45, *YAvg* = 10.58, *ZAvg* = -13.70] and the ideal variation of dose about the point is

*D = 19.45 + 2.52*(x-XAvg) – 0.25*(X-XAvg)^2*

* – 4.23*(y-YAvg) + 0.36*(y-YAvg)^2*

* + 1.75*(z-ZAvg) – 0.41*(z-ZAvg)^2 [1]
*

The spacing of data points about the measurement point is *Dx* = 1.00, *Dy* = 1.25 and *Dz* = 0.90. In this case, we’ll use a computer program to record the data in a *CSV* file. For reference, here is the core of the FORTRAN program:

OPEN(FILE='multiregress.csv',UNIT=30) WRITE (30,2000) XAvg = 23.45 YAvg = 10.58 ZAvg = -13.70 Dx = 1.00 Dy = 1.25 Dz = 0.90 ! Main data loop DO Nz = -5,5 z = ZAvg + REAL(Nz)*Dz DO Ny = -5,5 y = YAvg + REAL(Ny)*Dy DO Nx = -5,5 x = XAvg + REAL(Nx)*Dx CALL RANDOM_NUMBER(Zeta) Noise = 2.0*(0.5 - Zeta) D = 19.45 & + 2.52*(x-XAvg) - 0.25*(X-XAvg)**2 & - 4.23*(y-YAvg) + 0.36*(y-YAvg)**2 & + 1.75*(z-ZAvg) - 0.41*(z-ZAvg)**2 & + Noise WRITE (30,2100) x,y,z,D END DO END DO END DO CLOSE (30) 2000 FORMAT ('x,y,z,D') 2100 FORMAT (F14.8,',',F14.8,',',F14.8,',',F14.8)

The program creates a header and then writes 1331 data lines in *CSV* format. Each line contains the position and dose, [*x,y,z,D*]. Here are the initial entries in *multiregress.csv*:

x,y,z,D 18.45000076, 4.32999992, -18.19999886, 22.14448738 19.45000076, 4.32999992, -18.19999886, 28.94458199 20.45000076, 4.32999992, -18.19999886, 38.34004974 ...

Run **R**, clear the console window (*Control-L*) and close any previous script. Copy and paste the following text into the script editor:

rm(list=objects()) setwd("C:/RExamples/MultiRegress") DataSet = read.csv("multiregress.csv",header=T) # Calculate the measurement point XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) # Reference spatial variables to the measurement point DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg # Perform the fitting FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet) summary(FitModel) # Plot a scan along x through the measurement point png(filename="XScan.png") xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) plot(xscan$x,xscan$D) lines(xscan$x,predict(FitModel,newdata=xscan)) dev.off()

The first three lines perform operations that we have previously discussed: clear all **R** objects, set a working directory and read information from the *CSV* file created by the FORTRAN program. The information is stored in data frame *DataSet*. Use *Control-R* in the script editor to execute the command lines. Note that *DataSet* appears in the *Environment* list of **RStudio**. Double-click on **DataSet** to open it in a data-editor tab. Here, you can inspect values and even modify them,

The first task in the analysis is to shift the spatial values so that they are referenced to the measurement point, as in Eq. [1].

XAvg=mean(DataSet$x) YAvg=mean(DataSet$y) ZAvg=mean(DataSet$z) DataSet$x = DataSet$x - XAvg DataSet$y = DataSet$y - YAvg DataSet$z = DataSet$z - ZAvg

We use the *mean()* function under the assumption that the data points are centered about the measurement position. A single command performs the linear fit:

FitModel = lm(D~x+y+z+I(x^2)+I(y^2)+I(z^2),DataSet)

The *summary()* command gives the following information:

Estimate Std. Error Generating (Intercept) 19.540509 0.179142 19.45 x 2.560374 0.025733 2.52 y -4.226011 0.020587 -4.23 z 1.737097 0.028593 1.75 I(x^2) -0.265439 0.009214 -0.25 I(y^2) 0.358165 0.005897 0.36 I(z^2) -0.404630 0.011375 -0.41

As with all statistical calculations, a visual indicator is a quick way to gauge the validity of the fit. The following commands create a two-dimensional plot, a scan of data points and the fitting function along *x* at *y* = 0.0, *z* = 0.0 (through the measurement point). The plot is saved as a *PNG* file in the working directory.

[1] png(filename="XScan.png") [2] xscan = subset(DataSet,y < 0.001 & y > -0.001 & z < 0.001 & z > -0.001 ) [3] plot(xscan$x,xscan$D) [4] lines(xscan$x,predict(FitModel,newdata=xscan)) [5] dev.off()

Line [1] opens a file for *PNG* output, while Line5] flushes the plot information to the file and closes it. Line[2] uses the *subset()* function to create a data frame, *xscan*, that contains only data points where *y* = 0.0 and *z* = 0.0. The first argument is the source data frame and the second argument is the criterion for picking included row. There are two noteworthy features in the command expression:

- The symbol & denotes the logical
*and*operation. - The inclusion criterion has some slack. This is to account to floating point round-off errors. If you inspect
*xscan*with the data editor, you will note that are some values on the order of 10[-8] rather than 0.0

Line [3] plots the raw data points while the *lines()* command in Line 4 applies the method discussed in a previous article to plot the fitting function. Similar operations were applied to create scans through the measurement point along *y* and *z* to produce the plots of Fig. 2.

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.

]]>