Bayesian Modelling of Epidemic Processes

Written by Daniel Lawson from the University of Bristol School of Statistical Science.

Bayesian Modelling

Statistical inference: Bayesian modelling is a statistical inference procedure. Inference means learning from data. You have all done Likelihood-based inference, in which we know the probability of the data given the parameters, and invert to find the parameters for which this is maximised.

Bayesian inference uses Bayes Theorem,
$P(\theta | x) = \frac{P(x | \theta) P(\theta)}{P(x)},$
to write a Posterior probability for the parameters given the data. This uses the Likelihood $P(x | \theta)$ and the Prior $P(\theta)$ .

It is hard because we often cannot compute the normalising constant $P(x) = \int P(x | \theta) P(\theta) d\theta$ .

What Bayesian Inference means

If we provide a Prior that represents our true subjective beliefs, and a Model that contains all the possibilities that we believe could be true, then the Posterior is the correct probability of the parameters given the data. It is a consistent update rule, meaning that we will always get this answer, whatever order we see the data. It is “the right thing to do”.

Incorporating prior knowledge is often vital. See for example XKCD: Did the sun just explode?.

The two caveats are key, however. We often do not know how to completely specify our subjective beliefs over a complicated parameter space. We also rarely believe that we are entertaining all possible models that could be true.

Model misspecification: If we do not include the true model in $\theta$ , then we are searching instead for the “least bad solution in the parameter space”. This can have weird properties, because we don’t just get a point estimate - we are calculating the posterior which is a set of estimates (whose expectation is closest to the truth). See e.g. Figure 3 of Grunwald and van Ommen 2017 Inconsistency of Bayesian Inference for Misspecified Linear Models…
Prior misspecification: We often use convenience priors for computational reasons (see below). We almost always specify our priors on each parameter independently or at best pairwise, as quantifying high-dimensional spaces is hard. This is a real problem for Process Modelling such as for Epidemics, where parameters are correlated because they are chosen to “mean something”.

The end result of these issues is that Bayesian Inference is often best considered to be a computational procedure for arriving at parameter estimates, rather than about beliefs.

How to do Bayesian Inference

Some key concepts:

Numerical Integration is key to Bayesian Inference.
- The simplest approach is numerical integration or Monte-Carlo approaches, but these are not very efficient.
- MCMC: Instead we typically use Markov-Chain Monte-Carlo (MCMC) (or see Wikipedia) because it behaves better with dimension than regular numerical integration.
Intractable Likelihoods (ABC): To use MCMC requires being able to compute the Likelihood. Sometimes this is also intractable, e.g. if it involves many latent parameters, stochastic integration, etc. (NB Christophe Andrieu’s particle MCMC paper made a class of such models tractable, it may be helpful here. But there are always intractable models).
- If the likelihood is intractable, we can instead simulate from the prior and evaluate model fit. This is called Approximate Bayesian Computation and was (co-)invented by Mark Beaumont at Bristol.
- Doing this approximates the likelihood using the model fit. This is really difficult to get right and somewhere that Machine Learning is very important.
- The goal here is essentially to learn a set of summary statistics whose distance from the “true” parameter values captures the likelihood.
Identifiability is a really important idea. It says that the parameters can be inferred uniquely from the data, if we had enough of it. Most of the models we are interested in are not identifiable, or if they are, they are “weakly identifiable” meaning that we can’t practically expect to get the value. It affects parameter inference because if the parameters are unidentifiable, the posterior distribution doesn’t converge to a point, and is therefore a complex object which is really hard to sample from.
Model choice is also really important. Above I wrote we must "include the true model in $\theta$ ". Its long debated whether choosing between two models $\theta=\{\mathcal{M}_1,\mathcal{M}_2\}$ is the same as parameter estimation. The short answer is “no”, model choice (in which parameters mean different things in different models) is really hard.
Advanced methods include:
- Variational inference: abandoning the idea that a posterior is about beliefs, and the idea that we want to report the posterior. Instead of sampling from it, we can approximate it using the nearest parametric fit, and optimise the fit of this whole distribution to the posterior. This changes the problem from integration/sampling to optimisation, which is typically much easier.
- Parallelisation by various divide and conquer approaches including Expectation Propagation (EP)

Epidemic Processes

Epidemic processes are dynamical systems in which the state $\phi(t+\delta t)$ at time $t+\delta t$ depends on $\phi(t)$ , the state at time $t$ . Some statistical models, that is, models for which we can write an explicit likelihood, have this property. These include renewal processes, ARIMA models, etc. However, most dynamical systems and especially those with interesting intrinsic dynamics have no explicit form.

The hierarchy of models might be described as:

ODEs (Ordinary Differential Equations): a set of variables co-evolve according to some ODE. This makes the output deterministic given the input, and therefore a Likelihood can be written as an “error model”, that is, how far is the data from the prediction?
PDEs (Partial Differential Equations): a set of variables in space co-evolve according to a PDE. These are also deterministic but numerical integration is much harder, and the initial parameter space is technically infinite (the boundary conditions are a function, not a set of parameters).
SDEs (Stochastic Differential Equations): a set of variables co-evolve according to some equation with a deterministic plus a noise term. Some SDEs are nicely behaved, e.g. can be described by Gaussian noise and solved with a Kalman Filter or particle MCMC but many are not. Numerical integration is often hard and the output is probabilistic.
Compartmental models: Generalising the above, space, age categories, households, etc can be split up into multiple “bins” for which a ODE/SDE can be written.
IBMs (Individual Based Models): the highest fidelity representation of reality is a detailed model describing every detail we think might be important. These are very hard to infer because they are costly to simulate, and there are so many parameters that have very similar effects on how the model behaves.

In all cases, these dynamical systems may have interesting intrinsic structure. For example, there may be a “phase transition” where an epidemic changes from being periodic, to happening only once, or happening not at all.

Open research questions for the UG projects

There are open directions that can be explored.

Model mis-specification

If there was a true, complex model (say, a compartmental one similar to our paper, Booton et al 2020) and we performed inference with a simpler model that lacked the fine-grained detail, what would be lost? Under which circumstances? Are the parameters for COVID-19 well-behaved?
What if we replace the above compartmental model by an SDE, which we infer with an ODE? ODEs lead to a “likelihood” for the residuals, which is normally simple & independent (normal IID, negative binomial, etc). Can we change the likelihood to make it work better? This represents the “cost” of using a solvable dynamical model that can be handled in STAN when the truth is something more complicated.
If there was a true epidemic model, but we used a statistical model chosen for tractability, what is lost? What questions provide “honest” (e.g. unbiased) answers, and which could result in dangerous inference due to model mis-specification?

Approximate Bayesian Inference

How well do current state-of-the-art approaches to summary statistic selection perform on epidemic models? Do the accessible machine learning approaches work better? Under which circumstances, e.g. high number of parameters, when the dynamical system is near a phase transition, etc.
What state-of-the-art Machine learning approaches could be incorporated into these models?
What is the advantage of using an SDE vs ODE? Given that we cannot do STAN inference on an SDE.

Conceptual questions

How does Bayesian Inference resemble maximum likelihood inference for epidemic models? How much information is coming from the prior, vs the data?
What additional data we need to collect in order to make complex (high dimensional) parameter estimation procedures work?
How does the computational performance of MCMC behave as the complexity of the model grows? Are there practical limits to how complex the model can be?
What is lost when using alternative inference approaches such as variational methods? Are any of them both fast enough and accurate enough to be useful?

COVID-19 and directly relevant Papers

Approximate Bayesian Computation

Chandra 2020 “Stochastic Compartmental Modelling of SARS-CoV-2 with Approximate Bayesian Computation”.

Full Bayesian Modelling using ODEs

This is essentially using the toolkit STAN, following the recipe outlined by Grinsztajn et al 2020 in Bayesian workflow for disease transmission modeling in Stan.

There are few application papers to be found, however.

There is the South Africa paper that fails to cite STAN.

A different sampler is used in Why is it difficult to accurately predict the COVID-19 epidemic? Is it as good?

Full Bayesian Modelling using approximations

Most approaches you find will use this strategy; they change the model and predict a particular aspect of it.

Sampling approaches with a Bayesian Interpretation

Our paper, Booton et al 2020 Estimating the COVID-19 epidemic trajectory and hospital capacity requirements in South West England: a mathematical modelling framework uses a more complex model and samples parameter space.

A Machine Learning approach for Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil (Dal Molin Ribeiro et al 2020).

Background reading

Approximate Bayesian Computation

Start with Mark Beaumont’s 2019 Review on ABC.
There is a Facebook working group managed by Dennis Prangle, a key ABC researcher.

Fraser 2020 preprint on ABC that is robust to model mis-specification

Not all ABC needs sampling. Variational approaches are interesting. This from Dennis Prangles group: Black-box Variational Inference for Stochastic Differential Equations

Toolkits for ABC:

Simulation based inference SBI python package with the reference The frontier of simulation-based inference. This focusses on the Machine Learning aspects.
Jabot et al 2013 EasyABC R package provides several ABC sampling algorithms.
Michael Blum’s abc R package
There are more packages, please add to this list!

Machine Learning and ABC

Can the approach by Mondal et al 2019 Approximate Bayesian computation with deep learning supports a third archaic introgression in Asia and Oceania by adapted for process models?
Similarly Åkesson et al’s 2020 preprint Convolutional Neural Networks as Summary Statistics for Approximate Bayesian Computation
And an important NeurIPs paper from Yun Song’s group: A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

Bayesian approaches

Bayesian Choice Textbook by Christian Robert, 2007
John Paisley’s course notes on Bayesian Modelling for Machine Learning
Larry Wasserman’s course notes on Why Bayes Theorem is not Bayesian Inference (Thanks to Larry for the XKCD comic)

Notes

Written with StackEdit.