I got asked to write up a little something on what I have been doing (with others) on delay estimation and what the future holds. I thought others might like to see the semi-incoherent babble. For anyone involved in this stuff, sorry if I mislabelled, got things wrong, etc please flag and correct as you like.
I just threw this together so apologies for clarity. I may also have missed things or not properly connected things so please flag.
samiverse
This is a joke around being asked for what I have done - obviously, all of the below work is very team-oriented, and in several instances, all I did was cheerlead.
The TLDR is I am not keen on keeping implementing shards of the same model to support y feature, which is what is part of the motivation for my push to composable Julia (amongst other motivations).
-
A slide deck from a few years ago: https://samabbott.co.uk/presentations/2024/why-am-i-so-late.pdf
-
Estimating epidemiological delay distributions https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1
-
Builds on work by Seamen et al to assess a range of methods for estimating delay distributions in outbreak settings. Concludes that the ward and epinowcast (i.e marginal joint count and delay) methods are the best choices in most settings
-
Concludes that you do not need to account for epidemic phase in most settings because it is much easier to treat it as right truncation and deal with it that way not because you can ignore it.
-
Consludes that naive discretisation is biased in both estimating and use.
-
Connects nowcasting and delay estimation
-
-
Best practices for estimating epi delay distributions https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012520
-
Practioner focussed guidance for estimating and reporting epi delay distributions. Designed to be used as a reporting checklist or an estimation checklist.
-
Gives high level overview of biases and methods to account for them
-
Recommends the latent/ward method as the best choice of method in most settings.
-
-
Primarycensored: https://primarycensored.epinowcast.org/
- CRAN package with one dependency (pracma). Uses vendoring to supply stan side tools to those who want them. Pure R tools for estimation and discretisation
- Leverages an insight that double censoring is really a two-stage process i.e primary and secondary censoring, to solve the primary case and then treat the resulting cdf as any other cdf (i.e., secondary event censoring and truncation, etc., as normal). This allows for 1. A numerical solution to all combinations of distributions which outperforms the latent/ward method in virtual all real-world data settings and scales based on unique strata considered not data points as the latent/data aug approaches do 2. Analytical solutions that are easier to find (as it is a single integral) - implemented analytical solutions include gamma, lognornaml, weibull with uniform primary events. Combined this allows for models that are 1000s of times faster and can fit to reasonable data in a few seconds with full MCMC (see the docs).
- Manuscript in draft. TLDR is faster and more stable than the latent method. So far found no realistic edge cases where the latent method would be preferred.
- Implements a fitdistrplus extension for optional pure R fitting and an optional built in stan model for stan based fitting. Has tools to vendor stan code to other stan models.
- The discretisation methods implemented here are exact, and this can be confirmed via simulation or comparison to independent solutions i.e EpiEstim contains one for the double censored gamma (there is room for us having some kind of dream but I think only a bit). There is a range of issues in the epi ecosystem, indicating this could be adopted but with limited/no traction.
-
epidist: https://epidist.epinowcast.org/
-
Extends brms (bayesian regression modelling package) to include epi delay estimation suitable for outbreaks. Has an extensible design for adding new models and likelihoods i.e mixtures etc.
-
Implements two models currently, the latent method and the primarycensored marginal approach (by vendoring the code from that package).
-
Due to extending brms it can fit partially pooled and time-varying delay distributions jointly to many strata i.e by age and location see the vignettes.
-
It also has access to the brms ecosystem i.e tidybayes and so can be used for a range of postprocessing tasks, including imputing the secondary events.
-
Also, due to the brms connection, it has native support for variational inference methods, which have reasonable performance/accuracy tradeoffs - that being said, the marginal analytical solutions fit on 100,000 data points with multiple partially pooled strata in a minute or so.
-
Part of this efficiency is in tooling to compress the data by strata where possible and use weighted likelihoods.
-
As it uses brms metaprogrammes stan and so needs a compiler for either cmdstan or rstan backends. Some people can’t install stan.
-
Manuscript in draft.
-
-
Epinowcast: https://package.epinowcast.org/
-
From the package summary: A modular Bayesian framework for real-time infectious disease surveillance. Provides tools for nowcasting, reproduction number estimation, delay estimation, and forecasting from data subject to reporting delays, right-truncation, missing data, and incomplete ascertainment. Users can build models suited to their setting using a flexible formula interface supporting fixed effects, random effects, random walks, and time-varying parameters, with options including parametric and non-parametric delay distributions with optional modifiers (via discrete-time hazard models), renewal processes, observation models, missing data imputation, and stratified analyses with partial pooling. By jointly estimating disease dynamics and reporting patterns, our framework enables earlier and more reliable detection of trends.
-
Something like 10 times faster than naive versions of this model. Supports variational inference etc. Stan-based. Here cmdstan and so needs a compiler. Some people can’t install stan..
-
One of the key components from that is the handling of reporting anomalies and event imputation for both primary (which, if done naively, is liable to induce bias) and secondary events.
-
Can also handle multiple delays in a joint fashion with partial pooling etc etc as epidist.
-
It has a very large number of features due to aiming to model all (or at least many) linelist data problems, see: https://package.epinowcast.org/articles/features.html. This is also, in my opinion, its largest con.
-
There is no option to disable the joint component of the model. If there were, then the overlap between this and epidist becomes quite large in terms of functionality (here a superset) if not implementation.
-
-
CensoredDistributions.jl: https://censoreddistributions.epiaware.org/dev/
-
A pure Julia implementation of primarycensored w/ tutorials demonstrating fitting etc.
-
Extremely awesome code if I say so myself. Self awarding 10/10.
-
Part of demonstrating why Julia >> Stan/R stack and a test bed for ideas around DSL layers etc.. (see https://bsky.app/profile/seabbs.bsky.social/post/3mexz5xhws22j for future directions here)
-
-
EpiNow2: Estimate and Forecast Real-Time Infection Dynamics • EpiNow2
-
Non-truncation adjusted delay estimator - estimate_delay
-
Truncation adjusted delay estimator without proper handling of censoring - estimate_truncation. A variant of common multiplicative nowcasting methods but with a fun Bayesian twist. Totally ad-hoc and unevaluated. Was the inspiration of the great ship epinowcast as that originated from making this model more flexible before I realised how displeasing its long horizon nowcast error structure was.
-
Why are both of these in the same package? Before the great, nowcasting and delay estimation is the same TLDR surprise (for me).
-
Estimate_infections (the rt bit of the package) takes the output of both of these functions (or from other sources) to model latent delays and right truncation of count data in a non-joint manner (via priors). This could all be done together (also the case in i.e epinowcast but a lot more of that is joint).
-
-
Baselinenowcast: https://baselinenowcast.epinowcast.org/
-
What epinowcast is to generative marginal joint nowcasting this aims to be for chain ladder multiplicative nowcasting methods
-
Pure play R package - dep light.
-
Snappy - no MCMC etc etc
-
Adds some twists to the classic multiplicative nowcasting methods so that they can support zero counts and make use of partial data
-
As part of the process creates truncation-adjusted non-parametric delay estimates. Note these are not double interval censoring corrected.
-
Paper with eval and good chat: https://wellcomeopenresearch.org/articles/10-614
-
Key benefit over i.e epinowcast is super duper fast, no real deps, and handles negative count updates
-
Work in progress
-
Mixture and non-parametric support for primarycensored and epidist (see issues). Mostly stuck due to time constraints.
-
Implementing primarycensored model to replace the delay estimation model in EpiNow2 with support for the EpiNow2 delay interface/priors, etc.
-
Support for non-joint delay estimating, negative count updates, and improved forecast tooling in epinowcast (see issues - also lots of stuff could be done here, but time/money). Somewhat stuck due to time constraints.
-
Work with Barbora Němcová and Johannes Bracher to find better observation models for marginal nowcasting approaches that better capture observation level variance vs process variance (in the delays, etc., etc.) as flagged by Stoner et al. 2020
-
Paper with Johannes Bracher, Jacco Wallinga and the nowcasting methods crew to summarise the current state of ID epi nowcasting methods. In draft.
-
Paper as part of the Insight net collaboration to guide Public health practitioners in the US on how to deal with reporting challenges in surveillance data. Aiming to facilitate discussion between practitioners, i.e., we have X challenges and modellers we have Y solutions, and to find what the Z gaps are. Contributions welcome, see https://www.epinowcast.org/GuideToSTLTReportingDelays/
-
Work with Kylie Ainslee, Sang Woo Park and others on establishing an MVP generation time estimation framework for future modular expansion and to serve as a baseline. See https://community.epinowcast.org/t/minimum-viable-model-for-generation-time-estimation/387/9
-
Proposed (rejected) grant to extend generation time estimation methods to include time-to-event approaches. Along the way, the suggestion was to extend the primarycensoring approach to be nested (for generation time estimation) to leverage efficiency and to hit mixture and convolved primary and truncated dists. Big focus on modular infection processes to link to wider composability work. See grant app here: https://community.epinowcast.org/t/addressing-critical-gaps-in-generation-time-estimation-during-outbreaks-grant-application/317/5 still thinking about pathways forward. Joint estimation stands out here as a big gap in my opinion across lots of delays, where the relationship between them and observation is complex, i.e. paired convolved/mixtures of distributions with observation level biases and the potential role of infection processes (the most critical being where the infection process has a role)
-
Work with Nyall Jamieson, Lauren Meyers, Chris Overton and others on solving more analytical primarycensored distributions.
-
We (Nyall) have solutions for exponentially tilted primary event priors with exponential, lognormal, Weibull, Gamma, Burr 3, 12, derived.
-
We (Nyall) are working on and close to a general set of guidelines for what distributions can be solved, i.e., what you need and what the order ranking of difficulty might be.
-
Manuscript in draft. Solutions are expected in software in a few months or so.
-
Related to this work is software development to expose derived burr for users The Burr distribution as a model for the delay between key events in an individual’s infection history
-
-
Work with Nyall Jamieson, Lauren Meyers, Ian Hall and others relating epi delay estimating with non-infectious diseases and biohazard release modelling. TLDR primary censoring and release modelling are kind of the same. Trendy. Might unlock insights in epi delays. Definitely allows for the use of existing delay tools in release modelling. See https://pubmed.ncbi.nlm.nih.gov/21242803/ and draw a diagram to get the link.
-
Work with Nyall Jamieson et al to derive a generation time and other epi delay distributions along the same lines as for the incubation period. Early stage. Lots of scope for exploration here.
-
Not really delays, but involved in developing brms in Julia. The connection is I am helping by iterating using CensoredDistributions.jl to recreate what we can do in epidist, but without a multi-year dev effort (i.e. using native Julia composability). This should be a good approach for the future for those looking at flexible and multi-strata, etc., etc. models without all the eng effort of rolling your own at home. Very exciting.
-
Related to this is to do everything again, but this time in a modular Julia ecosystem, so not have to keep reimplementing for all the various shades of model expression and inference approaches. Also, to implement lessons learnt, etc., etc., on which methods we need and what they should look like for most real-world use. Big focus on creating modular joint models as well as robust staged approaches.
Gaps
For me, aside from the somewhat in-progress work (lots of which I think are pretty important and there are multiple viable pathways towards + the vast majority also need more boots on the ground), other forms of paired delay and all the many different edge cases and complexities. I.e what happens when estimating IFR and you need delays for both dead and not dead, what happens in households, what kinds of paired data are common, and what biases and delays do these have? At some point, we start to connect with tree based who infected who methods. What is the theory overlap (i.e similar to nowcasting and delay estimation)? How do we implement modular models for these that are efficient and get the basics right? Of the in-progress stuff, if the derived epi dists don’t work out, what dist and when is a key question. There is also near infinite stuff in observation errors to think about and model. I think that is really only something that can be done in a modular ecosystem because it becomes high-dimensional fast.