---
title: "The `mc2d` package"
author: "R. POUILLOT, M.-L. DELIGNETTE-MULLER, D.L. KELLY & J.-B. DENIS"
output:
  pdf_document:
    toc: true
    number_sections: true
    citation_package: natbib
geometry: "scale=0.8, centering"
biblio-style: plain
bibliography: docmcEnglish.bib
vignette: >
  %\VignetteIndexEntry{A Manual for mc2d: Tools for Two-Dimensional Monte-Carlo Simulations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(dpi = 96, fig.width = 6, fig.height = 4, dev = "pdf")
```


This documentation is intended for readers with:

- A medium level of experience in R. Please refer to *An Introduction to R* available with the R distribution if needed.
- Some knowledge about Monte-Carlo simulation and Quantitative Risk Assessment (QRA).

The definitive reference for all function arguments remains the package documentation. It is recommended to try the examples while reading.

# Introduction

## What is `mc2d`?

`mc2d` means Two-Dimensional Monte-Carlo (*Monte-Carlo à 2 Dimensions*). This package provides:

- additional probability distributions;
- tools to construct One-Dimensional and Two-Dimensional Monte-Carlo simulations;
- tools to analyse One-Dimensional and Two-Dimensional Monte-Carlo simulations.

In a previous version, distribution-fitting tools were included; they have since been moved to the separate package `fitdistrplus` \cite{FITDISTRPLUS}.

`mc2d` was built for QRA in the Food Safety domain but can be used in other frameworks.

## What is Two-Dimensional Monte-Carlo Simulation (briefly)?

The following text and Figure \ref{fig:mc2d} are adapted from \cite{POUILLOT-2004} and \cite{POUILLOT-2007}. The principal reference remains \cite{CULLEN-FREY-1999}.

A QRA should reflect the **variability** in the risk and take into account the **uncertainty** associated with the risk estimate. Variability represents temporal, geographical and/or individual heterogeneity of the risk across the population. Uncertainty stems from a lack of perfect knowledge about the QRA model structure and parameters.[^1]

[^1]: In the engineering risk community, these are called *aleatoric uncertainty* for variability and *epistemic uncertainty* for uncertainty.

A two-dimensional Monte-Carlo simulation separates variability and uncertainty by sampling their respective distributions independently \cite{CULLEN-FREY-1999}. The method proceeds as follows (see Figure \ref{fig:mc2d}):

1.  Parameters are divided into four categories: *fixed*, *variable* (variability only), *uncertain* (uncertainty only), and *variable and uncertain*. For the last category, a hierarchical structure is required, e.g.: $$r \mid a,b \sim N(a,b)$$ where the normal distribution represents variability in $r$ conditional on uncertain parameters $a$ and $b$, with hyper-distributions such as $a \sim \text{Unif}(l_a, u_a)$ and $b \sim \text{Unif}(l_b, u_b)$.
2.  A set of uncertain parameters is randomly sampled from their distributions.
3.  The model is evaluated with a one-dimensional Monte-Carlo simulation of size $N_v$, treating uncertain parameters as fixed. Statistics (mean, SD, percentiles) are computed and stored.
4.  Steps 2--3 are repeated $N_u$ times.
5.  The $50^\text{th}$ percentile (median) of each statistic is the point estimate; the $2.5^\text{th}$ and $97.5^\text{th}$ percentiles constitute the 95% credible interval.

\begin{figure}
\noindent \begin{centering}
\includegraphics[angle=270, width=1\textwidth]{mc2dsaumon}
\par\end{centering}
\caption{\label{fig:mc2d}Schematic Representation of a Two-Dimensional Monte-Carlo Simulation.}
\end{figure}

More formally, in its simplest version the framework is a chain of three random variables: $$p \rightarrow \pi \rightarrow Y$$ with marginal distribution $[p]$ and conditional distributions $[\pi \mid p]$ and $[Y \mid \pi]$, under the assumption $[p,\pi,Y]=[p][\pi \mid p][Y \mid \pi]$.

- $Y$ is the variable of interest.
- $\pi$ is the parameter governing the distribution of $Y$; its uncertainty is characterised through $[\pi \mid p]$.
- $p$ is the parameter governing the uncertainty of $\pi$; its distribution $[p]$ is assumed known.

A two-dimensional Monte-Carlo simulation provides a bundle of $N_u$ distributions $[Y \mid \pi=\pi_i]$ where $\pi_i, i=\{1,\ldots,N_u\}$ are drawn from $[\pi \mid p]$.

`mc2d` uses arrays of (at least) two dimensions: the first dimension reflects variability, the second reflects uncertainty.

## A basic example {#sec:Example1}

**Quantitative Risk Assessment: *Escherichia coli* O157:H7 infection linked to the consumption of frozen ground beef in \<3 year old children.**

*Warning*: the data are fictitious and the model is over-simplified; results should not be interpreted.

The model assumes:

- *E. coli* O157:H7 are randomly distributed in a batch with mean concentration $c=10$ CFU/g.
- No bacterial growth occurs (product is frozen until cooked just before consumption).
- 2.7% of consumers cook their beef rare, 37.3% medium, and 60.0% well done.
- Bacterial inactivation by cooking: no inactivation (rare), 1/5 surviving (medium), 1/50 surviving (well done).
- Serving size $s$ for \<3 year children follows a Gamma distribution: $shape=3.93$, $rate=0.0806$.
- The dose-response is a one-hit model with probability of illness per hit $r=0.001$.

*What is the risk of illness in the population that consumed the contaminated lot?*

### One-Dimensional Monte-Carlo Simulation

All distributions represent variability only: $$c = 10, \quad
i \sim emp(\{1, 1/5, 1/50\},\{0.027, 0.373, 0.600\}), \quad
s \sim \text{Gamma}(3.93, 0.0806)$$ $$n \sim \text{Poisson}(c \times i \times s), \quad r = 0.001, \quad P = 1-(1-r)^n$$

```{r sh0, echo=FALSE}
set.seed(666)
options("width"=90, "digits"=3)
knitr::opts_chunk$set(fig.path = "figs/")
```

```{r sh1}
library(mc2d)
ndvar(1001)
conc    <- 10
cook    <- mcstoc(rempiricalD, values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
serving <- mcstoc(rgamma, shape = 3.93, rate = 0.0806)
expo    <- conc * cook * serving
dose    <- mcstoc(rpois, lambda = expo)
r       <- 0.001
risk    <- 1 - (1 - r)^dose
EC1     <- mc(cook, serving, expo, dose, risk)
print(EC1)
summary(EC1)
```

The `print` output shows for each node: the variable mode, `nsv` (variability dimension size), `nsu` (uncertainty dimension size), `nva` (number of variates), basic statistics (min, mean, median, max), number of missing values (`Nas`), the node type (`0` = fixed, `V` = variable, `U` = uncertain, `VU` = variable and uncertain), and the output level `outm`.

This one-dimensional simulation gives a mean risk of approximately 5%, with 2.5% of the population having a risk greater than 20.3%.

### Two-Dimensional Monte-Carlo Simulation

Adding uncertainty: the mean concentration $c$ follows $N(10,2)$ and the parameter $r$ follows $\text{Unif}(0.0005, 0.0015)$: $$c \sim N(10,2), \quad i,s \text{ as before}, \quad n \sim \text{Poisson}(c \times i \times s),
\quad r \sim \text{Unif}(0.0005, 0.0015), \quad P = 1-(1-r)^n$$

```{r sh2}
ndunc(101)
conc    <- mcstoc(rnorm,       type = "U",  mean = 10, sd = 2)
cook    <- mcstoc(rempiricalD, type = "V",  values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
expo    <- conc * cook * serving
dose    <- mcstoc(rpois,       type = "VU", lambda = expo)
r       <- mcstoc(runif,       type = "U",  min = 0.0005, max = 0.0015)
risk    <- 1 - (1 - r)^dose
EC2     <- mc(conc, cook, serving, expo, dose, r, risk)
print(EC2, digits = 2)
summary(EC2)
```

The `type` argument indicates whether a distribution represents variability (`"V"`, default), uncertainty (`"U"`), or both (`"VU"`). The median of the 101 simulations gives a best estimate of 0.0457 with a 95% credible interval of [0.0238, 0.0794].

# Basic Principles and Functions

A typical `mc2d` session:

1.  Choose an empirical or parametric distribution for each parameter (`fitdistrplus` \cite{FITDISTRPLUS} is a convenient fitting tool).
2.  Construct an `mcnode` object for each parameter (`mcdata`, `mcstoc`).
3.  Group `mcnode` objects into an `mc` object (`mc`).
4.  Study the `mc` object via summaries, graphs, and sensitivity analysis (`summary.mc`, `plot.mc`, `tornado`, `tornadounc`).

## Preliminary Step

Load `mc2d` at the start of your R session with `library(mc2d)`. Set the default Monte-Carlo dimensions with `ndvar()` (variability) and `ndunc()` (uncertainty).

## The `mcnode` Object

### `mcnode` Object Structure

An `mcnode` is the basic element of an `mc` object — it is associated with one variable, while an `mc` is a set of associated variables.

An `mcnode` is an array of dimension $(nsv \times nsu \times nvariates)$.[^2] Four types exist:

[^2]: In this section we only consider univariate nodes with $nvariates=1$.

- **`V` mcnode** (Variability): dimension $(nsv \times 1 \times nvariates)$.
- **`U` mcnode** (Uncertainty): dimension $(1 \times nsu \times nvariates)$.
- **`VU` mcnode** (Variability and Uncertainty): dimension $(nsv \times nsu \times nvariates)$.
- **`0` mcnode** (Neither): dimension $(1 \times 1 \times nvariates)$. Not needed in the univariate context but useful for multivariate nodes (Section \ref{sec:Multivar}).

\begin{figure}
\noindent \begin{centering}
\includegraphics[angle=270, width=.7\textwidth]{Illustration}
\par\end{centering}
\caption{\label{fig:Structure}Structure of the various `mcnode` objects.}
\end{figure}

Ways to construct an `mcnode`:

1.  `mcstoc` — from random number generating functions.
2.  `mcdata` — from data.
3.  Direct operations on `mcnode` objects.
4.  `mcprobtree` — from a mixture of distributions (probability tree).
5.  Functions such as `==`, `>`, `is.na`, `is.finite` applied to an existing `mcnode`.

### The `mcstoc` function {#sub:mcstoc}

```         
mcstoc(func=runif, type=c("V","U","VU","0"), ...,
       nsv=ndvar(), nsu=ndunc(), nvariates=1, outm="each",
       nsample="n", seed=NULL, rtrunc=FALSE, linf=-Inf, lsup=Inf, lhs=FALSE)
```

- `func`: random-data function or its name as a character. Table \ref{tab:Distributions} lists available distributions.
- `type`: type of `mcnode` to build. Default: `"V"`.
- `...`: arguments passed to `func` (except the sample size). *All must be named.*
- `nsv`, `nsu`: number of samples in variability/uncertainty dimensions.
- `nvariates`: number of variates (Section \ref{sec:Multivar}).
- `outm`: default output for multivariate nodes (Section \ref{sec:Multivar}).
- `nsample`: name of the sample-size argument of `func` (usually `"n"`; use `"nn"` for `rhyper` and `rwilcox`).
- `seed`: random seed. If `NULL`, seed is unchanged.
- `rtrunc`: truncate the distribution between `linf` and `lsup`.
- `lhs`: use Latin Hypercube Sampling.

\begin{table}
\caption{\label{tab:Distributions}Available distributions}
\centering{}\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Package & Distribution & Function & Size arg. & Other parameters & trunc & lhs\tabularnewline
\hline\hline
`stats` & beta            & `rbeta`      & `n`  & `shape1, shape2, ncp`        & Y & Y\tabularnewline\hline
        & binomial        & `rbinom`     & `n`  & `size, prob`                 & Y & Y\tabularnewline\hline
        & Cauchy          & `rcauchy`    & `n`  & `location, scale`            & Y & Y\tabularnewline\hline
        & chi-squared     & `rchisq`     & `n`  & `df, ncp`                    & Y & Y\tabularnewline\hline
        & exponential     & `rexp`       & `n`  & `rate`                       & Y & Y\tabularnewline\hline
        & F               & `rf`         & `n`  & `df1, df2, ncp`              & Y & Y\tabularnewline\hline
        & gamma           & `rgamma`     & `n`  & `shape, rate (or scale)`     & Y & Y\tabularnewline\hline
        & geometric       & `rgeom`      & `n`  & `prob`                       & Y & Y\tabularnewline\hline
        & hypergeometric  & `rhyper`     & `nn` & `m, n, k`                    & Y & Y\tabularnewline\hline
        & lognormal       & `rlnorm`     & `n`  & `meanlog, sdlog`             & Y & Y\tabularnewline\hline
        & logistic        & `rlogis`     & `n`  & `location, scale`            & Y & Y\tabularnewline\hline
        & neg.\ binomial  & `rnbinom`    & `n`  & `size, prob (or mu)`         & Y & Y\tabularnewline\hline
        & normal          & `rnorm`      & `n`  & `mean, sd`                   & Y & Y\tabularnewline\hline
        & Poisson         & `rpois`      & `n`  & `lambda`                     & Y & Y\tabularnewline\hline
        & Student's t     & `rt`         & `n`  & `df, ncp`                    & Y & Y\tabularnewline\hline
        & uniform         & `runif`      & `n`  & `min, max`                   & Y & Y\tabularnewline\hline
        & Weibull         & `rweibull`   & `n`  & `shape, scale`               & Y & Y\tabularnewline\hline
        & Wilcoxon        & `rwilcox`    & `nn` & `m, n`                       & Y & Y\tabularnewline\hline
`mc2d`  & Bernoulli       & `rbern`      & `n`  & `prob`                       & Y & Y\tabularnewline\hline
        & emp.\ discrete  & `rempiricalD`& `n`  & `values, prob`               & Y & Y\tabularnewline\hline
        & emp.\ cont.     & `rempiricalC`& `n`  & `min, max, values, prob`     & Y & Y\tabularnewline\hline
        & PERT            & `rpert`      & `n`  & `min, mode, max, shape`      & Y & Y\tabularnewline\hline
        & triangular      & `rtriang`    & `n`  & `min, mode, max`             & Y & Y\tabularnewline\hline
        & gen.\ beta      & `rbetagen`   & `n`  & `shape1,shape2,min,max,ncp`  & Y & Y\tabularnewline\hline
        & multinomial     & `rmultinomial`& `n` & `size, prob`                 & N & N\tabularnewline\hline
        & Dirichlet       & `rdirichlet` & `n`  & `alpha`                      & N & N\tabularnewline\hline
        & mv.\ normal     & `rmultinormal`& `n` & `mean, sigma`                & N & N\tabularnewline\hline
        & beta subjective & `rbetasubj`  & `n`  & `min, mode, mean, max`       & Y & Y\tabularnewline\hline
        & min.\ info.     & `rmqi`       & `n`  & `mqi, mqi.quantile, ...`     & Y & Y\tabularnewline\hline
\end{tabular}
\end{table}

In our example, `mcstoc` specified `conc` (normal), `cook` (empirical discrete), `serving` (gamma), and `dose` (Poisson with `mcnode` argument `lambda`).

```{r sh3, eval=FALSE}
conc    <- mcstoc(rnorm,       type = "U",  mean = 10, sd = 2)
cook    <- mcstoc(rempiricalD, type = "V",  values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
dose    <- mcstoc(rpois,       type = "VU", lambda = expo)
r       <- mcstoc(runif,       type = "U",  min = 0.0005, max = 0.0015)
```

A normal distribution $N(2,3)$ truncated on $[1.5, 2]$ with Latin Hypercube Sampling:[^3]

[^3]: The mean and SD of the non-truncated distribution are not preserved after truncation.

```{r sh3bis}
x <- mcstoc(rnorm, mean = 2, sd = 3, rtrunc = TRUE, linf = 1.5, lsup = 2, lhs = TRUE)
summary(x)
```

Additional distributions in `mc2d`: `rbern` (Bernoulli), `rempiricalD` (empirical discrete), `rpert` \cite{VOSE-2000}, `rtriang`, `rdirichlet`, `rmultinormal`, `rmqi` (min. quantile info.). The multinomial distribution is vectorized as `rmultinomial` (use instead of `stats::rmultinom`).

### The `mcdata` function {#sub:mcnode}

```         
mcdata(data, type=c("V","U","VU","0"), nsv=ndvar(), nsu=ndunc(), nvariates=1, outm="each")
```

See the function documentation for accepted data sizes and types. Example — placing `TRUE` in a `"U"` node for half the simulations:

```{r sh3ter}
nu  <- ndunc()
tmp <- (1:nu) > (nu/2)
mcdata(tmp, type = "U")
```

### Operations on an `mcnode`

`mcnode` objects support arithmetic operations with coherent type propagation (Table \ref{tab:Ops}).[^4]

[^4]: These rules differ from standard R recycling rules.

- `0` + `0` = `0`; `0` + `V` = `V`; `0` + `U` = `U`; `0` + `VU` = `VU`
- `V` + `V` = `V`; `V` + `U` = `VU`[^5]; `V` + `VU` = `VU`[^6]
- `U` + `U` = `U`; `U` + `VU` = `VU`[^7]; `VU` + `VU` = `VU`

[^5]: `U` is recycled by row; `V` classically by column.

[^6]: `V` recycled by column.

[^7]: `U` recycled by row.

\begin{table}
\caption{\label{tab:Ops}`mcnode` combination types}
\centering{}\begin{tabular}{|c||c|c|c|c|}
\hline
 & 0 & V & U & VU\tabularnewline\hline\hline
0  & 0  & V  & U  & VU\tabularnewline\hline
V  & V  & V  & VU & VU\tabularnewline\hline
U  & U  & VU & U  & VU\tabularnewline\hline
VU & VU & VU & VU & VU\tabularnewline\hline
\end{tabular}
\end{table}

```{r sh4, eval=FALSE}
expo <- conc * cook * serving   # U * V * V → VU
risk <- 1 - (1 - r)^dose        # U * VU → VU
```

### The `mcprobtree` function {#sub:The-mcprobtree-function}

`mcprobtree` builds an `mcnode` as a mixture of distributions. If the microbiologists are 75% sure that $conc \sim N(10,2)$ and 25% sure that $conc \sim U(8,12)$:[^8]

[^8]: Alternatives for `whichdist`: `mcstoc(rempiricalD, type="U", values=c(0,1), prob=c(75,25))` or `mcstoc(rbern, type="U", prob=0.25)`.

```{r sh4b}
conc1    <- mcstoc(rnorm,  type = "U", mean = 10, sd = 2)
conc2    <- mcstoc(runif,  type = "U", min = 8, max = 12)
whichdist <- c(0.75, 0.25)
concbis   <- mcprobtree(whichdist, list("0" = conc1, "1" = conc2), type = "U")
```

`mcprobtree` can also generate samples from a mixture for variability.

### Other functions for constructing an `mcnode`

Comparison operators (`==`, `<`, `<=`, `>=`, `>`) generate an `mcnode` when applied to one. Special functions `is.na`, `is.nan`, `is.finite`, `is.infinite` are also implemented.

```{r sh5}
cook < 1
suppressWarnings(tmp <- log(mcstoc(runif, min = -1, max = 1)))
tmp
is.na(tmp)
```

### Specifying a correlation between `mcnode`s

A Spearman rank correlation structure between 2 or more nodes can be specified with `cornode`, which uses the Iman & Conover method \cite{IMAN-CONOVER-1982}.[^9]

[^9]: The resulting correlation (\~0.4) is an approximation because a discrete distribution (`cook`: 3 categories) is correlated with a continuous distribution (`serving`).

```{r sh5bis}
cornode(cook, serving, target = 0.5, result = TRUE)
```

Correlations can be specified between `V`, `U`, or `VU` nodes, or between one `V` node and multiple `VU` nodes. A multivariate normal distribution (`rmultinormal`) is another way to specify correlations assuming normality.

## The `mc` Object

Once `mcnode` objects are constructed, group them into an `mc` object for analysis. An `mc` is a list of `mcnode`s. It can be constructed with `mc()`, `evalmcmod()`, or within `evalmccut()`.

### The `mc` function

```         
mc(..., name=NULL, devname=FALSE)
```

`...` are `mcnode` or `mc` objects. In our example:

```{r sh6, eval=FALSE}
EC2 <- mc(conc, cook, serving, expo, dose, r, risk)
print(EC2)
summary(EC2)
```

### The `mcmodel` and `evalmcmod` functions

`mcmodel` wraps a model expression for later evaluation with `evalmcmod`. Use this once the model is validated, to re-run it with different dimensions or seeds in one line.

```{r sh10}
modelEC3 <- mcmodel({
  conc    <- mcstoc(rnorm,       type = "U",  mean = 10, sd = 2)
  cook    <- mcstoc(rempiricalD, type = "V",  values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
  serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
  r       <- mcstoc(runif,       type = "U",  min = 0.0005, max = 0.0015)
  expo    <- conc * cook * serving
  dose    <- mcstoc(rpois,       type = "VU", lambda = expo)
  risk    <- 1 - (1 - r)^dose
  mc(conc, cook, serving, expo, dose, r, risk)
})
modelEC3
```

Notes:

- The model is wrapped between `{` and `}`.
- Any valid R code is allowed inside.[^10]
- The model must end with an `mc()` call.

[^10]: The simulation dimensions can be referenced via `ndvar()` and `ndunc()` if needed.

```         
evalmcmod(expr, nsv=ndvar(), nsu=ndunc(), seed=NULL)
```

```{r sh11, eval=FALSE}
EC3 <- evalmcmod(modelEC3, nsv = 100, nsu = 10,   seed = 666)
EC4 <- evalmcmod(modelEC3, nsv = 100, nsu = 1000, seed = 666)
```

### The `mcmodelcut` and `evalmccut` functions

For high-dimensional models that exceed R's memory limit, `evalmccut` evaluates the model in a loop over the uncertainty dimension, computing and storing statistics at each step.[^11]

[^11]: Using a `tornado` inside the model should be avoided as it slows `evalmccut` considerably.

```{r sh12, eval=FALSE}
modEC4 <- mcmodelcut({
  ## First block: unidimensional nodes
  {
    cook    <- mcstoc(rempiricalD, type = "V",  values = c(0, 1/5, 1/50), prob = c(0.027, 0.373, 0.6))
    serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
    conc    <- mcstoc(rnorm,       type = "U",  mean = 10, sd = 2)
    r       <- mcstoc(runif,       type = "U",  min = 5e-04, max = 0.0015)
  }
  ## Second block: two-dimensional nodes → mc object
  {
    expo <- conc * cook * serving
    dose <- mcstoc(rpois, type = "VU", lambda = expo)
    risk <- 1 - (1 - r)^dose
    res  <- mc(conc, cook, serving, expo, dose, r, risk)
  }
  ## Third block: statistics
  {
    list(
      sum    = summary(res),
      plot   = plot(res, draw = FALSE),
      minmax = lapply(res, range),
      tor    = tornado(res),
      et     = sapply(res, sd)
    )
  }
})
res <- evalmccut(modEC4, nsv = 10001, nsu = 101, seed = 666)
summary(res)
```

## Analysing an `mc` Object

### The `summary` function

```         
summary(object, probs=c(0,0.025,0.25,0.5,0.75,0.975,1), lim=c(0.025,0.975), ...)
```

Quantiles in `probs` are evaluated in the variability dimension; then the median and `lim` quantiles are evaluated across those statistics.

```{r sh14}
tmp <- summary(EC2, probs = c(0.995, 0.999), digits = 12)
tmp$risk
```

### The `hist` function

```         
hist(x, griddim=NULL, xlab=names(x), ylab="Frequency", main="", ...)
```

Provides histograms of each `mcnode` in the `mc` object. Note: variability and uncertainty are collapsed, so the histogram may be approximate.

```{r sh15, eval=FALSE}
hist(EC2)
```

```{r Function-hist, echo=FALSE, fig.cap="The `hist` function."}
hist(EC2)
```

From `mc2d` version 0.2-0, `gghist` provides a `ggplot2`-based histogram, returning a `ggplot2` object that can be further customised with `+` operators.

### The `plot` function

```         
plot(x, prec=0.001, stat=c("median","mean"), lim=c(0.025,0.25,0.75,0.975),
     na.rm=TRUE, griddim=NULL, xlab=NULL, ylab="Fn(x)", main="",
     draw=TRUE, paint=TRUE, ...)
```

Plots the empirical CDF with uncertainty envelope. The 0.25 and 0.75 quantiles (default `lim`) form the inner envelope.

```{r sh17, eval=FALSE}
plot(EC2)
```

```{r Function-plot, echo=FALSE, fig.cap="The `plot` function."}
plot(EC2)
```

From `mc2d` version 0.2-0, `ggplotmc` provides a `ggplot2`-based version.

`spaghetti` draws individual empirical CDF step-functions, one per uncertainty sample, instead of the summary envelope produced by `plot`. The `maxlines` argument caps the number of lines drawn.

```         
spaghetti(x, griddim=NULL, xlab=names(x), ylab="F(n)", main="", maxlines=100, ...)
```

```{r sh17ter, eval=FALSE}
spaghetti(EC2)
```

```{r Function-spaghetti, echo=FALSE, fig.cap="The `spaghetti` function."}
spaghetti(EC2)
```

`ggspaghetti` provides the same plot via `ggplot2`.

All `mcnode` objects support the same `print`, `summary`, `plot`, and `hist` methods. `ggplotmc`, `gghist`, and `ggspaghetti` on an `mcnode` allow post-processing of the graph.

### The `tornado` function

```         
tornado(x, output=length(x), use="all.obs", method=c("spearman","kendall","pearson"),
        lim=c(0.025,0.975))
```

Computes Spearman (default) rank correlation between nodes. `output` specifies the output node (default: last). `tornado` creates a `tornado` object with a `plot` method.

```{r sh19}
torEC2 <- tornado(EC2)
plot(torEC2)
```

```{r Function-plottor, echo=FALSE, fig.cap="The `plot.tornado` function."}
plot(torEC2)
```

From `mc2d` version 0.2-0, `ggtornado` provides a `ggplot2`-based version.

### The `tornadounc` function

```         
tornadounc(mc, output=length(mc), quant=c(0.5,0.75,0.975),
           use="all.obs", method=c("spearman","kendall","pearson"), ...)
```

Examines the impact of uncertainty on an output estimate. Computes Spearman rank correlation between statistics of `mcnode` objects in the variability dimension. `quant` specifies which quantiles to use.

```{r sh19c}
tornadounc(EC2, output = "risk", quant = .99)
```

The output shows the impact of uncertain `U` nodes and statistics (mean, median, 99th percentile) computed in the variability dimension of `VU` nodes.

### The `mcratio` function

Provides variability, uncertainty, and combined measures \cite{OZKAYNAK-2009}. Given:

- **A** = median of uncertainty for the median of variability
- **B** = median of uncertainty for the 97.5th percentile of variability
- **C** = 97.5th percentile of uncertainty for the median of variability
- **D** = 97.5th percentile of uncertainty for the 97.5th percentile of variability

Ratios: Variability = B/A; Uncertainty = C/A; Overall Uncertainty = D/A.

```{r sh19d}
mcratio(risk)
```

## Other Functions and `mc` Objects

`mc` objects are simply lists of three-dimensional arrays. `$` extracts an `mcnode`. `unmc` removes attributes and collapses unit dimensions to return vectors, matrices, or arrays.

```{r sh19ter}
tmp  <- unmc(EC2, drop = TRUE)
dimu <- ncol(tmp$risk)
coef <- sapply(1:dimu, function(x) lm(tmp$risk[, x] ~ tmp$dose[, x])$coef)
apply(coef, 1, summary)
```

# Multivariate Nodes {#sec:Multivar}

The `nvariates` dimension is the third dimension of an `mcnode`. It is mandatory for multivariate distributions, and useful in other situations.

Note that:

```{r sh19ter_b}
mcstoc(runif, nvariates = 3, min = c(1, 2, 3), max = 4)
```

does **not** produce a node with 3 variates each having a different lower limit — the vector `c(1,2,3)` is recycled along the variability dimension (standard R behaviour). Use instead:

```{r sh19quatr}
lim <- mcdata(c(1, 2, 3), type = "0", nvariates = 3)
mcstoc(runif, nvariates = 3, min = lim, max = 4)
```

## Multivariate Nodes for Multivariate Distributions

The primary use of multivariate nodes is for multivariate distributions: Dirichlet, multinomial, multivariate normal, and empirical.

**Example 1.** Three-member families each buy 500 g of ground beef. The proportions eaten by the baby, older brother, and mother follow a Dirichlet $(\alpha=(2,3,5))$ distribution.

```{r sh20}
(p <- mcstoc(rdirichlet,    type = "U",  nvariates = 3, alpha = c(2, 3, 5)))
s  <- mcstoc(rmultinomial, type = "VU", nvariates = 3, size = 500, prob = p)
summary(s)
```

**Example 2.** Each family member eats a normal distribution of steak (100, 150, 250 g mean) with positive correlation between the children's servings and negative with the mother's.

```{r sh21}
sigma <- matrix(c(10, 2, -5, 2, 10, -5, -5, -5, 10), ncol = 3)
(x   <- mcstoc(rmultinormal, type = "V", nvariates = 3, mean = c(100, 150, 250),
               sigma = as.vector(sigma)))
cor(x[, 1, ])
```

Both `mean` and `sigma` can be variable or uncertain.[^12] Example with uncertain mean:

[^12]: `rmultinormal` is a vectorized version of `rmvnorm` from `mvtnorm`.

```{r sh21b}
m   <- mcdata(c(100, 150, 250), type = "0", nvariates = 3)
mun <- mcstoc(rnorm,        type = "U",  nvariates = 3, mean = m, sd = 20)
x   <- mcstoc(rmultinormal, type = "VU", nvariates = 3, mean = mun, sigma = as.vector(sigma))
cor(x[, 1, ])
```

**Example 3.** Non-parametric bootstrap: 6 individuals eat 100 g, 12 eat 150 g, 6 eat 170 g, and 6 eat 200 g \cite{CULLEN-FREY-1999}.

```{r sh22}
val <- c(100, 150, 170, 200)
pr  <- c(6, 12, 6, 6)
out <- c("min", "mean", "max")
(x  <- mcstoc(rempiricalD, type = "U",  outm = out, nvariates = 30, values = val, prob = pr))
mcstoc(rempiricalD, type = "VU", values = x)
```

The `outm` option controls which statistics to show: `"none"`, `"each"` (default), or a vector of function names applied across all 30 variates.

## Multivariate Nodes as a Third Dimension for Multiple Options

Assume uncertainty in `conc`: 75% sure $conc \sim N(10,2)$, 25% sure $conc \sim U(8,12)$. Build a bivariate node and propagate both hypotheses simultaneously.

```{r sh23}
conc1 <- mcstoc(rnorm,  type = "U",  mean = 10, sd = 2)
conc2 <- mcstoc(runif,  type = "U",  min = 8, max = 12)
conc  <- mcdata(c(conc1, conc2), type = "U", nvariates = 2)

cook    <- mcstoc(rempiricalD, type = "V",  values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
expo    <- conc * cook * serving
dose    <- mcstoc(rpois,       type = "VU", nvariates = 2, lambda = expo)
r       <- mcstoc(runif,       type = "U",  min = 0.0005, max = 0.0015)
risk    <- 1 - (1 - r)^dose
EC5     <- mc(conc, risk)
summary(EC5)
```

(Remember to specify `nvariates = 2` in `mcstoc` for `dose` — `mc2d` cannot infer it.)

## Multivariate Nodes as a Third Dimension for Multiple Contaminants

Two contaminants with uncertain concentrations $N(10,2)$ and $N(14,2)$:[^13]

[^13]: A correlation between contaminants could be specified via `rmultinormal`.

```{r sh24}
mconc   <- mcdata(c(10, 14), type = "0",  nvariates = 2)
conc    <- mcstoc(rnorm,       type = "U",  nvariates = 2, mean = mconc, sd = 2)
cook    <- mcstoc(rempiricalD, type = "V",  values = c(1, 1/5, 1/50), prob = c(0.027, 0.373, 0.600))
serving <- mcstoc(rgamma,      type = "V",  shape = 3.93, rate = 0.0806)
expo    <- conc * cook * serving
dose    <- mcstoc(rpois,       type = "VU", nvariates = 2, lambda = expo)
dosetot <- mcapply(dose, margin = "variates", fun = sum)
r       <- mcstoc(runif,       type = "U",  min = 0.0005, max = 0.0015)
risk    <- 1 - (1 - r)^dosetot
EC6     <- mc(conc, risk)
summary(EC6)
```

# Another Example: A QRA of Waterborne Cryptosporidiosis in France

Adapted from \cite{POUILLOT-2004}. Goal: evaluate the risk of infection with *Cryptosporidium parvum* from consumption of tap water given $n$ oocysts/100 l observed in a storage reservoir (simplified model).

## Tap Water Consumption Model

```{r Crypto1, echo=FALSE}
library(mc2d)
inca <- structure(c(0, 22.08, 60, 64.4, 72, 82.8, 90, 96, 100, 110, 120,  137.5, 144, 150, 160, 162.5, 165, 180, 182.5, 184, 192, 192.5,  200, 216, 220, 225, 230, 240, 250, 264, 270, 276, 288, 290, 300,  304, 312.8, 320, 322, 325, 330, 336, 340, 350, 360, 370, 375,  380, 384, 390, 400, 414, 420, 425, 430, 432, 432.5, 440, 442,  450, 456, 460, 460.8, 464, 470, 470.4, 480, 490, 500, 504, 510,  510.4, 516, 520, 525, 525.6, 528, 530, 540, 544, 550, 552, 560,  562, 565, 570, 576, 580, 582.5, 584, 585.6, 590, 596, 600, 606,  610, 614, 620, 625, 630, 635.4, 640, 648, 650, 656.2, 660, 664.4,  670, 670.4, 672, 675, 680, 682, 690, 696, 700, 710, 716, 720,  730, 730.4, 740, 744, 750, 756, 760, 774.8, 780, 784, 792, 796,  800, 810, 820, 828, 830, 840, 850, 850.4, 860, 864, 866.4, 870,  880, 890, 900, 908, 910, 915.2, 920, 930, 936, 950, 960, 970,  980, 984, 986.4, 990, 996, 1000, 1015.2, 1020, 1028, 1032, 1036,  1040, 1042, 1050, 1070, 1075, 1078.8, 1080, 1090, 1096, 1100,  1110, 1120, 1126.4, 1128, 1130, 1140, 1148, 1150, 1152, 1160,  1170, 1175, 1176.2, 1190, 1196, 1200, 1214, 1220, 1230, 1240,  1248, 1250, 1260, 1276, 1280, 1290, 1296, 1300, 1320, 1322, 1330,  1340, 1350, 1360, 1370, 1386.4, 1400, 1410, 1414, 1420, 1430,  1440, 1446, 1450, 1460, 1480, 1500, 1520, 1530, 1550, 1560, 1600,  1650, 1680, 1696, 1700, 1710, 1720, 1750, 1760, 1800, 1830, 1840,  1850, 1900, 1920, 1936, 1954, 1980, 1990, 2000, 2014, 2050, 2100,  2200, 2220, 2248, 2250, 2276, 2300, 2310, 2340, 2400, 2550, 2568,  2700, 2720, 2784, 2820, 2876, 3000, 3100, 3108, 3200, 2578, 7,  1, 8, 14, 3, 1, 1, 10, 1, 250, 1, 2, 120, 8, 6, 1, 5, 3, 12,  5, 5, 375, 2, 8, 7, 41, 408, 53, 4, 24, 7, 3, 2, 217, 1, 1, 44,  9, 1, 31, 1, 1, 17, 294, 5, 3, 9, 3, 12, 525, 5, 23, 1, 3, 4,  1, 28, 3, 154, 2, 5, 1, 2, 6, 1, 299, 8, 148, 1, 2, 1, 1, 8,  3, 1, 2, 14, 20, 1, 18, 2, 20, 6, 1, 8, 2, 8, 1, 1, 1, 4, 1,  487, 3, 5, 1, 7, 1, 5, 1, 24, 3, 17, 1, 42, 1, 2, 1, 1, 1, 16,  1, 3, 1, 30, 4, 1, 183, 4, 1, 5, 1, 141, 1, 14, 1, 12, 1, 2,  1, 206, 6, 2, 1, 4, 92, 10, 1, 5, 1, 3, 5, 5, 2, 87, 1, 1, 1,  5, 5, 4, 4, 78, 1, 3, 2, 1, 16, 1, 133, 1, 5, 1, 1, 1, 4, 1,  43, 1, 1, 1, 30, 1, 1, 7, 2, 6, 1, 1, 3, 3, 1, 10, 1, 5, 1, 1,  1, 1, 1, 159, 2, 1, 1, 10, 1, 16, 4, 2, 5, 3, 1, 3, 11, 4, 1,  2, 12, 5, 1, 1, 44, 3, 2, 1, 2, 17, 1, 4, 1, 1, 17, 1, 1, 3,  4, 18, 14, 4, 1, 2, 1, 1, 1, 2, 12, 1, 2, 1, 1, 1, 1, 1, 3, 1,  20, 1, 1, 1, 7, 1, 1, 3, 1, 2, 2, 1, 6, 1, 1, 1, 1, 1, 1, 1,  2, 1, 1, 2), .Dim = c(270L, 2L))
inca <- rep(inca[, 1], inca[, 2]) / 1000
```

We have raw data on daily tap water consumption from 1,180 consumers (`inca`, histogram below). We could use the empirical distribution directly:

```{r Function-inca, echo=FALSE, fig.cap="Histogram of daily tap water intake."}
hist(inca, xlab = "l/day", freq = FALSE, main = "")
```

```{r Crypto3}
ndvar(1001)
ndunc(101)
mcstoc(rempiricalD, type = "V", values = inca)
```

Instead, we use `fitdistrplus` \cite{FITDISTRPLUS}. The data include many zeros (days without tap water consumption). We model them as a mixture.

```{r Crypto4, fig.show='hide'}
library(fitdistrplus)
pzero    <- sum(inca == 0) / length(inca)
inca_non_0 <- inca[inca != 0]
descdist(inca_non_0)
```

```{r Function-descdist, echo=FALSE, fig.cap="Graph from `descdist`.", results='hide', message=FALSE}
descdist(inca_non_0)
```

From the shape descriptor plot above, we try a lognormal distribution:

```{r Crypto6}
Adj_water <- fitdist(inca_non_0, "lnorm", method = "mle")
meanlog   <- Adj_water$est[1]
sdlog     <- Adj_water$est[2]
summary(Adj_water)
```

The fit is satisfactory. Uncertainty around the MLE estimates could be incorporated via `bootdist`:

```{r Crypto7bis, eval=FALSE}
Boot       <- bootdist(Adj_water, bootmethod = "param", niter = ndunc())
Mean_conso <- mcdata(Boot$estim$meanlog, type = "U")
Sd_conso   <- mcdata(Boot$estim$sdlog,   type = "U")
conso1     <- mcstoc(rlnorm, type = "VU", meanlog = Mean_conso, sdlog = Sd_conso)
```

For simplicity, we ignore that uncertainty and use `mcprobtree` to build the mixture:

```{r Crypto8}
conso0 <- mcdata(0, type = "V")
conso1 <- mcstoc(rlnorm, type = "V", meanlog = meanlog, sdlog = sdlog)
v      <- mcprobtree(c(pzero, 1 - pzero), list("0" = conso0, "1" = conso1), type = "V")
summary(v)
```

## The Dose-Response Model

Bootstrap from data `datDR` \cite{CHAPPELL-1996}. A function `DR` with argument `n` is defined and passed to `mcstoc`:

```{r Crypto9}
datDR <- list(dose = c(30, 100, 300, 500, 1000, 10000, 100000, 1000000),
              pi   = c(2, 4, 2, 5, 2, 3, 1, 1),
              ni   = c(5, 8, 3, 6, 2, 3, 1, 1))

estDR <- function(pos, ref) {
  suppressWarnings(
    -glm(cbind(ref$ni - pos, pos) ~ ref$dose + 0,
         binomial(link = "log"))$coefficients)
}

ml <- 1 - exp(-estDR(datDR$pi, datDR) * datDR$dose)

DR <- function(n) {
  boot <- matrix(rbinom(length(datDR$dose) * n, datDR$ni, ml), nrow = length(datDR$dose))
  apply(boot, 2, estDR, ref = datDR)
}
r <- mcstoc(DR, type = "U")
summary(r)
```

## The Model

```{r Crypto10}
Rr <- mcstoc(rbeta, type = "U", shape1 = 2.65, shape2 = 3.64)
w  <- mcstoc(rbeta, type = "V", shape1 = 2.6,  shape2 = 3.4)
```

Given $O_o = 2$ oocysts observed in 100 l, the expected number of oocysts in the sample `l`:

```{r Crypto11}
Oo <- 2
l  <- (Oo + mcstoc(rnbinom, type = "U", size = Oo + 1, prob = Rr)) / 100
```

The expected number drunk by individuals and the risk ($\times 10000$):

```{r Crypto12}
Or <- l * v * w
P  <- 10000 * (1 - exp(-r * Or))
summary(P)
```

This can be compared to Table 2 in \cite{POUILLOT-2004}. Results for $O_o = \{0,1,2,5,10,20,50,100,1000\}$ can be obtained in one step using:

```{r Crypto13, eval=FALSE}
Oo <- mcdata(c(0, 1, 2, 5, 10, 20, 50, 100, 1000), type = "0", nvariates = 9)
```

# As a Conclusion {.unnumbered}

We hope that `mc2d` will help risk assessors to construct and analyse their models, and to develop two-dimensional simulations. *Please report any bugs to [rpouillot\@yahoo.fr](mailto:rpouillot@yahoo.fr){.email}.*
