
An Introduction to Modular Monte-Carlo Risk Analysis with mcmodule
Natalia Ciria
Source:vignettes/intro.Rmd
intro.RmdIntroduction
This is a gentle and casual welcome to the wonders of modular Monte
Carlo risk analysis and the use of mcmodule.
For a more formal approach, I invite you to read the package official vignette.
I developed this new R package because of the lack of suitable tools to handle complex risk analysis models, with thousands of parameters, hundreds of cases, dozens of scenarios, and several pathways (like farmR!SK).
mcmodule is a framework for building modular Monte Carlo
risk analysis models. It extends the capabilities of mc2d
to make working with multiple risk pathways, variates, and scenarios
easier. The package includes tools for creating stochastic objects from
data frames, visualizing results, and performing uncertainty,
sensitivity, and convergence analysis.
A simple risk assessment
Let’s imagine we want to buy a heifer from a friend. We know their farm is infected with pathogen A, a disease that your farm is free from. To reduce risk, we plan to perform a diagnostic test before bringing the heifer to our farm. We want to calculate the probability of introducing the disease if we purchase one heifer that tests negative.
This risk assessment can be performed using base R random sampling functions:
set.seed(123)
n_iterations <- 10000
# PARAMETERS
# Within-herd prevalence
w_prev <- runif(n_iterations, min = 0.15, max = 0.2)
# Test sensitivity
test_sensi <- runif(n_iterations, min = 0.89, max = 0.91)
# Probability an animal is tested in origin
test_origin <- 1 # Yes
# EXPRESSIONS
# Probability that an animal in an infected herd is infected (a = animal)
inf_a <- w_prev
# Probability an animal is tested and is a false negative
# (test specificity assumed to be 100%)
false_neg_a <- inf_a * test_origin * (1 - test_sensi)
# Probability an animal is not tested
no_test_a <- inf_a * (1 - test_origin)
# Probability an animal is not detected
no_detect_a <- false_neg_a + no_test_a
# RESULT
summary(no_detect_a)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01353 0.01617 0.01743 0.01750 0.01878 0.02193
A similar approach can be implemented with mc2d (mc2d-2?). This package
provides additional probability distributions (such as rpert) and other
useful tools for analysing Monte Carlo simulations.
library(mc2d)
set.seed(123)
n_iterations <- 10000
# Within-herd prevalence
w_prev <- mcstoc(runif, min = 0.15, max = 0.2,
nsu = n_iterations, type="U")
# Test sensitivity
test_sensi <- mcstoc(rpert, min = 0.89, mode = 0.9, max = 0.91,
nsu = n_iterations, type="U")
# Probability an animal is tested in origin
test_origin <- mcdata(1, type="0") #Yes
# EXPRESSIONS
# Probability that an animal in an infected herd is infected (a = animal)
inf_a <- w_prev
# Probability an animal is tested and is a false negative
# (test specificity assumed to be 100%)
false_neg_a <- inf_a * test_origin * (1 - test_sensi)
# Probability an animal is not tested
no_test_a <- inf_a * (1 - test_origin)
# Probability an animal is not detected
no_detect_a <- false_neg_a + no_test_a
mc_model<-mc(w_prev, inf_a, test_origin, test_sensi,
false_neg_a, no_test_a, no_detect_a)
# RESULT
hist(mc_model)
no_detect_a## node mode nsv nsu nva variate min mean median max Nas type outm
## 1 x numeric 1 10000 1 1 0.0138 0.0175 0.0174 0.0218 0 U each
Multiple risk assessments at once
In the previous example, we calculated the risk for one specific case. However, we know that this farm is also positive for pathogen B, so it would be also interesting to calculate the risk of introducing it as well. Pathogen B has different within-herd prevalence and test sensitivity than Pathogen A.
To estimate the risk for both pathogens with our previous models, we could:
Copy and paste the code twice with different parameters (against all good coding practices)
Wrap the code in a function and call it twice using each pathogen’s parameters as arguments
Create a loop
While these options work, they become messy or computationally intensive when the number of parameters or different situations to simulate increases.
The package mc2d offers a clever solution to this
scalability problem: variates. In this package, parameters are stored as
mcnode class objects. These objects are arrays of
numbers that represent random variables and have three dimensions:
variability × uncertainty × variates.

In the previous example, our stochastic nodes only had uncertainty dimension. However, we can now use the variates dimension to calculate the risk of introduction of both pathogens at the same time.
set.seed(123)
n_iterations <- 10000
# Within-herd prevalence
w_prev_min <- mcdata(c(a = 0.15, b = 0.45), nvariates = 2, type="0")
w_prev_max <- mcdata(c(a = 0.2, b = 0.6), nvariates = 2, type="0")
w_prev <- mcstoc(runif, min = w_prev_min, max = w_prev_max,
nsu = n_iterations, nvariates = 2, type="U")
# Test sensitivity
test_sensi_min <- mcdata(c(a = 0.89, b = 0.80), nvariates = 2, type = "0")
test_sensi_mode <- mcdata(c(a = 0.9, b = 0.85), nvariates = 2, type = "0")
test_sensi_max <- mcdata(c(a = 0.91, b = 0.90), nvariates = 2, type = "0")
test_sensi <- mcstoc(rpert, min = test_sensi_min,
mode = test_sensi_mode, max = test_sensi_max,
nsu = n_iterations, nvariates = 2, type="U")
# Probability an animal is tested in origin
test_origin <- mcdata(c(a = 1, b = 1), nvariates = 2, type="0")
# EXPRESSIONS
# Probability that an animal in an infected herd is infected (a = animal)
inf_a <- w_prev
# Probability an animal is tested and is a false negative
# (test specificity assumed to be 100%)
false_neg_a <- inf_a * test_origin * (1 - test_sensi)
# Probability an animal is not tested
no_test_a <- inf_a * (1 - test_origin)
# Probability an animal is not detected
no_detect_a <- false_neg_a + no_test_a
mc_model<-mc(w_prev, inf_a, test_origin, test_sensi,
false_neg_a, no_test_a, no_detect_a)
# RESULT
no_detect_a## node mode nsv nsu nva variate min mean median max Nas type outm
## 1 x numeric 1 10000 2 1 0.0139 0.0175 0.0174 0.0217 0 U each
## 2 x numeric 1 10000 2 2 0.0477 0.0787 0.0783 0.1178 0 U each
Instead of manually typing the parameter values, you can also use
columns from a data table in mcdata(). A useful template
for setting up risk analysis models using mc2d, with custom
functions to facilitate data manipulation and visualization, can be
found in this repository: https://github.com/NataliaCiria/risk_analysis_template.
When to use mcmodule?
The mc2d multivariate approach works well for basic
multivariate risk analysis. However, if instead of purchasing one cow,
you’re dealing with multiple cattle purchases, from different farms,
across different pathogens, scenarios, and age categories, or modeling
multiple risk pathways with different what-if scenarios, this approach
becomes unwieldy.
mcmodule addresses these challenges by providing
functions for multivariate operations and
modular management of the risk model. It automates the
process of creating mcnodes and assigns metadata to them (making it easy
to identify which variate corresponds to which data row). Thanks to this
mcnode metadata, it enables row-matching between nodes with different
variates, combines probabilities across variates, and calculates
multilevel trials. As your risk analysis grows, you can create separate
modules for different pathways, each with independent parameters,
expressions, and scenarios that can later be connected into a complete
model.
This package is particularly useful for:
Working with complex models that involve multiple pathways, pathogens, or scenarios simultaneously
Dealing with large parameter sets (hundreds or thousands of parameters)
Handling different numbers of variates across different parts of your model that need to be combined
Creating modular risk assessments where different components need to be developed independently but later integrated (for example in collaborative projects)
Performing sophisticated sensitivity analyses across multiple model components
However, for simpler analyses, such as single pathway models,
exploratory work, small models with few parameters, one-off analyses or
learning risk assessment mcmodule’s additional structure
may be unnecessary.
Installing mcmodule
Now let’s explore this new package! It’s about to be submitted to CRAN, but since it’s not there yet, we’ll install it from GitHub instead.
# install.packages("devtools")
devtools::install_github("NataliaCiria/mcmodule")And we load the package in our R session. Easy-peasy, ready to go!
Other recommended packages to load along with mcmodule are:
Building an mcmodule
To quickly understand the key components of an mcmodule, we’ll start by building one using the animal imports example included in the package. For a more detailed view of each component, refer to the model elements section in the package vignette.
Data
Let’s consider a scenario where we want to evaluate the risk of introducing pathogen A and pathogen B into our region from animal imports from different regions (north, south, east, and west). We have gathered the following data:
-
animal_imports: number of animal imports with their mean and standard deviation values per region, and the number of exporting farms in each region.animal_imports## origin farms_n animals_n_mean animals_n_sd ## 1 nord 5 100 6 ## 2 south 10 130 10 ## 3 east 7 140 12 -
prevalence_region: estimates for both herd and within-herd prevalence ranges for pathogens A and B, as well as an indicator of how often tests are performed in originprevalence_region## pathogen origin h_prev_min h_prev_max w_prev_min w_prev_max test_origin ## 1 a nord 0.08 0.10 0.15 0.2 sometimes ## 2 a south 0.02 0.05 0.15 0.2 sometimes ## 3 a east 0.10 0.15 0.15 0.2 never ## 4 b nord 0.50 0.70 0.45 0.6 always ## 5 b south 0.25 0.30 0.37 0.4 sometimes ## 6 b east 0.30 0.50 0.45 0.6 unknown -
test_sensitivity: estimates of test sensitivity values for pathogen A and Btest_sensitivity## pathogen test_sensi_min test_sensi_mode test_sensi_max ## 1 a 0.89 0.90 0.91 ## 2 b 0.80 0.85 0.90
Now we will use dplyr::left_join() to create our imports
module data:
imports_data<-prevalence_region%>%
left_join(animal_imports)%>%
left_join(test_sensitivity)%>%
relocate(pathogen, origin, test_origin)## Joining with `by = join_by(origin)`
## Joining with `by = join_by(pathogen)`
Data keys
From now on we will use only the merged imports_data
table. However, it is useful to understand which input dataset each
parameter comes from, as each dataset provides information for different
keys. In this context, keys are fields that (combined) uniquely identify
each row in a table. In our example:
animal_importsprovided information by region of"origin"prevalence_regionprovided information by"pathogen"and region of"origin"test_sensitivityprovided information by"pathogen"only
The resulting merged table, imports_data, will therefore
have two keys: "pathogen" and "origin".
However, not all parameters will use both keys, for example,
"test_sensi" only has information by
"pathogen". Knowing the keys for each parameter is crucial
when performing multivariate operations, such as calculating totals
(in(sec-calculating-totals?)).
To make these relationships explicit in the model, we need to provide the data keys. These are defined in a list with one element for each input dataset, specifying both the columns and the keys for each dataset.
mcnodes table
With values and keys established, we still need some information to build our stochastic parameters. The mcnode table specifies how to build mcnodes from the data table. It specifies which parameters are included in the model, the type of parameters (those with an mc_func are stochastic), and what columns to look for in the data table to build this mcnodes (the name of the mcnode, or another variable in the data columns), as well as transformations that are usefull to encode categorical data values into mcnodes that must always be numeric.
mcnode: Name of the Monte Carlo node (required)
description: Description of the parameter
mc_func: Probability distribution
from_variable: Column name, if it comes from a column with a name different to the mcnode
transformation: Transformation to be applied to the original column values
sensi_analysis: Whether to include in sensitivity analysis
Here we have the imports_mctable for our example. While
the mctable can be hard-coded in R, it’s more efficient to prepare it in
a CSV or other external file. This approach also allows the table to be
included as part of the model documentation.
| mcnode | description | mc_func | from_variable | transformation | sensi_analysis |
|---|---|---|---|---|---|
| h_prev | Herd prevalence | runif | NA | NA | TRUE |
| w_prev | Within herd prevalence | runif | NA | NA | TRUE |
| test_sensi | Test sensitivity | rpert | NA | NA | TRUE |
| farms_n | Number of farms exporting animals | NA | NA | NA | FALSE |
| animals_n | Number of animals exported per farm | rnorm | NA | NA | FALSE |
| test_origin_unk | Unknown probability of the animals being tested in origin (true = unknown) | NA | test_origin | value==“unknown” | FALSE |
| test_origin | Probability of the animals being tested in origin | NA | NA | ifelse(value == “always”, 1, ifelse(value == “sometimes”, 0.5, ifelse(value == “never”, 0, NA))) | FALSE |
The data table and the mctable must complement each other:
mcnodes without a
mc_func(likefarms_n), needs the matching column name ("farms_n") in the data table-
mcnodes with an
mc_func, you need columns for each probability distribution argument in the data table. For example:h_prevwithrunifdistribution requires"h_prev_min"and"h_prev_max"animals_nwithrnormdistribution requires"animals_n_mean"and"animals_n_sd"
For encoding categorical variables as mcnodes (or any other data
transformation), you can use any R code with value as a
placeholder for the mcnode name or column name (specified in
from_variable)
Expressions
Finally, we need to write the model’s mathematical expression. This
expression should ideally include only arithmetic operations, not R
functions (with some exceptions that will be covered later in (sec-tricks-and-tweaks?)).
We’ll use the same expressions introduced in (sec-introduction?).
However, we’ll wrap it using quote() so it isn’t executed
immediately but stored for later evaluation with
eval_model().
imports_exp<-quote({
# Probability that an animal in an infected herd is infected (a = animal)
inf_a <- w_prev
# Probability an animal is tested and is a false negative
# (test specificity assumed to be 100%)
false_neg_a <- inf_a * test_origin * (1 - test_sensi)
# Probability an animal is not tested
no_test_a <- inf_a * (1 - test_origin)
# Probability an animal is not detected
no_detect_a <- false_neg_a + no_test_a
})Evaluating a model
With all components in place, we’re now ready to create our first
mcmodule using eval_module().
imports<-eval_module(
exp = c(imports=imports_exp),
data = imports_data,
mctable = imports_mctable,
data_keys = imports_data_keys
)##
## imports evaluated
##
## mcmodule created (expressions: imports)
class(imports)## [1] "mcmodule"
An mcmodule is an S3 object class, and it is essentially a list that contains all risk assessment components in a structured format.
names(imports)## [1] "data" "exp" "node_list" "modules"
The mcmodule contains the input data and mathematical
expressions (exp) that ensure traceability. All input and
calculated parameters are stored in node_list. Each node
contains not only the mcnode itself but also important metadata: node
type (input or output), source dataset and columns, keys, calculation
method, and more. The specific metadata varies depending on the node’s
characteristics. Here are a few examples:
imports$node_list$w_prev## $type
## [1] "in_node"
##
## $mc_func
## [1] "runif"
##
## $description
## [1] "Within herd prevalence"
##
## $inputs_col
## [1] "w_prev_min" "w_prev_max"
##
## $input_dataset
## [1] "prevalence_region"
##
## $keys
## [1] "pathogen" "origin"
##
## $module
## [1] "imports"
##
## $mc_name
## [1] "w_prev"
##
## $mcnode
## node mode nsv nsu nva variate min mean median max Nas type outm
## 1 x numeric 1001 1 6 1 0.15 0.175 0.175 0.2 0 V each
## 2 x numeric 1001 1 6 2 0.15 0.175 0.174 0.2 0 V each
## 3 x numeric 1001 1 6 3 0.15 0.175 0.174 0.2 0 V each
## 4 x numeric 1001 1 6 4 0.45 0.524 0.525 0.6 0 V each
## 5 x numeric 1001 1 6 5 0.37 0.385 0.386 0.4 0 V each
## 6 x numeric 1001 1 6 6 0.45 0.523 0.524 0.6 0 V each
##
## $data_name
## [1] "imports_data"
imports$node_list$no_detect_a## $node_exp
## [1] "false_neg_a + no_test_a"
##
## $type
## [1] "out_node"
##
## $inputs
## [1] "false_neg_a" "no_test_a"
##
## $module
## [1] "imports"
##
## $mc_name
## [1] "no_detect_a"
##
## $keys
## [1] "pathogen" "origin"
##
## $param
## [1] "false_neg_a" "no_test_a"
##
## $mcnode
## node mode nsv nsu nva variate min mean median max Nas type outm
## 1 x numeric 1001 1 6 1 0.0822 0.0961 0.0960 0.110 0 V each
## 2 x numeric 1001 1 6 2 0.0824 0.0960 0.0959 0.110 0 V each
## 3 x numeric 1001 1 6 3 0.1500 0.1746 0.1742 0.200 0 V each
## 4 x numeric 1001 1 6 4 0.0500 0.0787 0.0781 0.110 0 V each
## 5 x numeric 1001 1 6 5 0.2059 0.2214 0.2214 0.237 0 V each
## 6 x numeric 1001 1 6 6 0.4504 0.5233 0.5241 0.600 0 V each
##
## $data_name
## [1] "imports_data"
Now that we have an mcmodule, we can begin exploring its possibilities!
Summarizing
In the imports mcmodule, we can already see the raw mcnode results
for the probability of an imported animal not being detected
(no_detect_a). However, it’s difficult to determine which
pathogen or region these results refer to. The mc_summary()
function solves this problem by linking mcnode results with their key
columns in the data. Note that while the printed summary looks similar
to the raw mcnode, it’s actually just a dataframe containing statistical
measures, whereas the actual mcnode is a large array of numbers with
dimensions (uncertainty × 1 × variates),
mc_summary(mcmodule = imports, mc_name = "no_detect_a")## mc_name pathogen origin mean sd Min 2.5%
## 1 no_detect_a a nord 0.09607419 0.007983987 0.08215801 0.08318831
## 2 no_detect_a a south 0.09600944 0.007877740 0.08241757 0.08325282
## 3 no_detect_a a east 0.17458727 0.014431815 0.15001204 0.15091040
## 4 no_detect_a b nord 0.07870798 0.011697425 0.05000413 0.05776186
## 5 no_detect_a b south 0.22143628 0.006212484 0.20587390 0.21013928
## 6 no_detect_a b east 0.52329144 0.043320684 0.45044560 0.45356043
## 25% 50% 75% 97.5% Max nsv Na's
## 1 0.08903559 0.09597030 0.10302211 0.1094443 0.1104595 1001 0
## 2 0.08913769 0.09594299 0.10261766 0.1093425 0.1104777 1001 0
## 3 0.16228149 0.17419132 0.18660810 0.1992568 0.1999452 1001 0
## 4 0.07012427 0.07811934 0.08762139 0.1022697 0.1100034 1001 0
## 5 0.21712775 0.22136949 0.22595050 0.2331940 0.2373635 1001 0
## 6 0.48400210 0.52407730 0.55987018 0.5956471 0.5998064 1001 0