(This is tenth and final article in the series *Using R for Gambet statistical analysis*.[1]) The **R** package includes functions to create a wide array of distributions of interest in mathematics. Nonetheless, there are many particle distributions of interest in physics (such as differential cross sections) that are not directly represented. This article introduces a method to sample from any probability function that can be approximated by a set of points.

Let’s start by creating some data to work with. We’ll model the kinetic energy distribution of positrons emitted in the β decay of Na22. The maximum value is 540 keV. To approximate the probability mass density, we’ll use one half cycle of a sine function skewed toward higher energies to represent Coulomb repulsion from the product nucleus. Copy and paste the following commands to the **RStudio** script editor:

rm(list=objects()) xmax = 540.0 nmax = 50 MassDens = function(x) { Out = sin(x*pi/(xmax))*(0.35+0.4*x/xmax) return(Out) } xval = c(seq(from=0.0,to=xmax,length.out=(nmax+1))) mdens = numeric(length=(nmax+1)) for (n in 1:(nmax+1)) { mdens[n] = MassDens(xval[n]) } plot(xval,mdens) curve(MassDens,0.0,xmax,xname="x",add=TRUE)

The commands

MassDens = function(x) { Out = sin(x*pi/(xmax))*(0.35+0.4*x/xmax) return(Out) }

define a function *MassDens* that follows the curve of Figure 1. The command

xval = c(seq(from=0.0,to=xmax,length.out=(nmax+1)))

creates a vector of 51 points equally spaced along the energy axis from 0.0 to 540.0 keV. The commands

mdens = numeric(length=(nmax+1)) for (n in 1:(nmax+1)) { mdens[n] = MassDens(xval[n]) }

create an empty vector to hold the probability mass density *p(x)* and fill the values with a loop that uses the *MassDens* function. At this point, the relative probability is sufficient — it is not necessary to worry about normalization.

To assign energy values for a particle distribution, we need the cumulative probability distribution, defined as

*P(x)* = ∫ *p(x’)dx’*

from *xmin* to *xmax*. The function *P(x)* equals the probability that the energy is less than or equal to *x*. It has a value of 0.0 at *xmin* and 1.0 at *xmax*. The following commands use the trapezoidal rule to perform the integral:

dx = xmax/nmax cdens = numeric(length=(nmax+1)) cdens[1] = 0.0 for (n in 2:(nmax+1)) { cdens[n] = cdens[n-1] + (mdens[n-1]+mdens[n])*dx/2.0 } cdens = cdens/cdens[51]

Note that the final command normalizes the function. The result is a vector *cdens* of 51 points to represent the cumulative distribution.

The cumulative probability *P(x)* is a monotonically-increasing function of *x* — there is a unique value of *x* for every value of *P(x)*. Therefore, we can view *x* as a function of *P(x)*, as illustrated by Fig. 2 created with the command:

plot(cdens,xval)

We can also make an accurate calculation of *x* for any *P(x)* using the spline interpolation functions of **R**. The line in Fig. 2 was created with the command:

lines(spline(cdens,xval))

Figure 3 illustrates the principle underlying of the sampling method. Suppose we want to create 10,000 particles. By the definition of the cumulative probability distribution, a total of 1000 of the particles should be contained within the interval 0.6 ≤ *P* ≤ 0.7. The graph show that these particles should be assigned energies in the range 320 keV ≤ *x* ≤ 360 keV. In other words, if ζ represents a uniform sequence of 10,000 numbers from 0.0 to 1.0, then the desired distribution will result if the energy values are assigned according to

*x*[*n*] = Inv*P*(ζ[*n*])

The **R** expression for the assignment operation uses a form of the *spline()* function that creates values at specified points:

NMax = 10000 dpoints = numeric(length=NMax) zetavals=c(seq(from=0.0,to=1.0,length.out=NMax)) ztemp = spline(cdens,xval,xout=zetavals) dpoints = ztemp$y hist(dpoints,breaks=35)

In this case, the *spline()* function returns a data frame containing both the independent and dependent values. The assigned energies are set equal to the dependent values, *dpoints = ztemp$y*. The top graph in Figure 4 shows a histogram of the resulting distribution.

The routine creates the same distribution each time the script is run. In some circumstances, you may want to add variations so that runs are statistically independent. In this case, the uniform sequence of ζ values may be replaced with a random-uniform distribution:

zetavals=runif(NMax,0.0,1.0)

The lower graph in Fig. 4 shows the result.

In conclusion, the **R** package has a vast set of available commands and options that could occupy several textbooks. In this tutorial, I’ve tried to cover the fundamental core. My goal has been to clarify the sometimes arcane syntax of R so you have the background to explore additional functions. A compendium of scripts and data files for the examples is available. To make a request, please contact us.

**Footnotes**

[1] The entire series is available as a PDF book at http://www.fieldp.com/rintroduction.html.

[2] Contact us : techinfo@fieldp.com.

[3] Field Precision home page: www.fieldp.com.