Community seminar - 2025-06-04 - Alba Halliday - Modelling diseases with delayed reporting and nested structures using a hierarchical framework

For the June seminar we have Alba Halliday more here Alba Halliday - Modelling diseases with delayed reporting and nested structures using a hierarchical framework – Epinowcast

Please post asynchronous questions below.

1 Like

reminder that this is tomorrow!

Delayed reports hide
Bayesian models reveal
Truth in nested data

1 Like

Also its quite a stealthy code base (reminded by @jamesazam) so see here: GitHub - AlbaMH/nimbleCast: What the Package Does (One Line, Title Case)

Really great stuff!

Here is a LLM generated summary for people.

Seminar Summary: Hierarchical Bayesian Modelling for Disease Surveillance

Presenter: Alba Halliday (University of Glasgow) Collaborators: Oliver Stoner (University of Glasgow), Leonardo Bastos (Oswaldo Cruz Foundation, Brazil), Theo Economou (University of Exeter)

High-Level Overview

• Development of a hierarchical Bayesian framework for nowcasting disease surveillance data with reporting delays and nested structures
• Application to Brazilian SARI (Severe Acute Respiratory Illness) hospitalisations and nested COVID-19 cases
• Joint modelling approach that links total SARI cases with COVID-positive subset to improve prediction accuracy
• Implementation challenges and solutions for operational disease surveillance systems

Problem Context

• SARI hospitalisations: individuals with fever and cough onset within 10 days requiring hospitalisation
• Reporting delays: cases occur in week T but are reported with delays of 0, 1, 2+ weeks
• Nested structure: proportion of SARI cases that test positive for COVID-19
• Challenge: COVID test results have unknown reporting delays (no timestamp when results are added to records)

Methodological Approach

Generalised Dirichlet Multinomial (GDM) Model

• Joint model for total counts (negative binomial) and partial counts (generalised Dirichlet multinomial)
• More flexible than standard multinomial approaches due to additional dispersion parameter • Better captures covariance structure in partial counts
• Implemented as series of beta binomial distributions for improved MCMC efficiency

Nested Structure Integration

• COVID cases modelled as proportion of total SARI hospitalisations using beta binomial distribution
• Key innovation: link between expected SARI trend and expected COVID proportion through shared parameter (δ)
• Intuition: waves in SARI often driven by COVID waves • Additional censoring layer to account for incomplete COVID reporting

Key Results

• Improved prediction precision compared to existing nowcasting approaches for COVID fatalities
• Joint modelling shows better performance than separate models, particularly for capturing trend dynamics
• Age covariate further improved COVID predictions (elderly more likely to be hospitalised for COVID)
• Model successfully captures both temporal trends and delay distributions in real data

Operational Considerations

Current Practice

• FIOCruz runs InfoGripe surveillance system using INLA (fast, 2-3 minutes runtime)
• Marginal model approach with simpler computational requirements

Proposed Framework Benefits

• More flexible model specification (handles non-standard distributions)
• Potentially more robust (INLA installation issues, struggles with sparse data)
• Enables joint modelling of multiple data streams

Implementation Challenges

• Computational cost: ~12 hours for rolling nowcast experiment vs. minutes for INLA
• Requires MCMC parameter specification and more programming
• Barrier for novel users compared to INLA’s simplicity

Software Development

NimbleCast R Package

• General package for fitting various nowcasting models
• Similar syntax to INLA to facilitate transitions
• Supports both GDM and simpler approaches
• Available on presenter’s GitHub (still in development)

Future Directions

• Extension to multiple viruses (RSV, influenza) alongside COVID within SARI framework
• Forecasting capabilities beyond nowcasting for preventive interventions
• Application to other nested surveillance structures:

  • Viral variants within viruses
  • Deaths as subset of cases
  • Dengue cases (lab-confirmed subset of dengue-like symptoms)

• Potential pathogen interaction modelling with semi-mechanistic approaches

Technical Specifications

• Data: Brazilian SARI surveillance 2021-2024, 27 federal units
• Delays: up to 20 weeks for SARI, 30 weeks for COVID
• Window: 60-week rolling window for analysis • Implementation: R package Nimble for MCMC sampling
• Spatial effects: independent across federal units (extensible to spatiotemporal)

Discussion Points Raised

• Stability of COVID-SARI link over time and with changing epidemiology
• Potential for multiple pathogen interaction terms
• Window length optimisation for operational use
• Alternative approaches (e.g., downloading historic datasets to reconstruct delays, fitting to the counts with a joint model)
• Computational efficiency improvements (new sampling methods in development)

My main question is what would you recommend thinking about adding to epinowcast from this work and in what order?