A workflow for infectious disease modelling

Off the back of the recent surveillance data workshop at CIRM our group has been working on a paper that has now become a “workflow for infectious disease modelling”. This is very heavily inspired by [2011.01808] Bayesian Workflow and is, in some sense, an infectious disease modelling wrapper for it (or at least aiming to be).

Infectious disease models can be used to inform critical public health decisions, yet often lack systematic development and validation practices.
The infectious disease modelling community has been slow to adopt rigorous model development and criticism cycles such as the Bayesian workflow, even as these methods become increasingly formalised and widely used in other domains. Recent outbreaks have demonstrated some domain-specific challenges that infectious disease modelling faces, including evolving research questions, emerging data sources, and adapting surveillance systems.
Here, we suggest a workflow for developing and evaluating infectious disease models, building on general Bayesian workflow advice and focusing on domain-specific challenges. As infectious disease models typically require multiple sources of information, the Bayesian paradigm is a natural framework. This workflow is designed for anyone developing an infectious disease model, and for users of model outputs who need to be able to evaluate modelling studies. At each stage, we provide recommendations based on our experience. We begin by outlining an approach for characterising epidemiological data source properties through a structured checklist. We then present an iterative workflow that extends the Bayesian workflow to the infectious disease domain, with the checklist informing decisions throughout each workflow stage. Our workflow includes defining the research question, development of Directed Acyclic Graph representations of process and observation models in a state-space framework, model modularisation, inference and computation choices, model specification and validation, integration method selection, and real-world considerations.
Throughout, we identify feedback loops where later decisions impact earlier choices. We also give guidance on using the workflow in evolving settings, such as outbreaks, and on how to report its use. To demonstrate this workflow, we use four schematic case studies that progressively integrate data sources for estimating transmission intensity.
In each one, we give examples of navigating real-world trade-offs between model complexity, computational feasibility, and inferential goals.
These case studies highlight how different data types can provide complementary information but may also impact other workflow choices.
Our suggested framework emphasises parsimony, modularity, interpretability, and model criticism. By proposing domain-specific workflow practices, we aim to provide a foundation for improving the quality and transparency of infectious disease modelling, particularly during outbreaks where flexible, principled approaches are essential.

Draft: First draft · Issue #24 · seabbs/infectious-disease-modelling-with-multiple-datasources · GitHub

We are still getting co-author feedback etc. so this isn’t really quite ready for prime time but very keen on hearing feedback. Note its very long so make sure to read the introduction bit about suggested user journeys. The general plan is to keep this as a preprint for a while with a few rounds of updates as we get comments.

For those interested I am very keen to find collaborators to implement some examples of using this workflow in different tool stacks both as a stress test of it but also to be able to point people at.

2 Likes