Really great stuff!
Here is a LLM generated summary for people.
Seminar Summary: Hierarchical Bayesian Modelling for Disease Surveillance
Presenter: Alba Halliday (University of Glasgow) Collaborators: Oliver Stoner (University of Glasgow), Leonardo Bastos (Oswaldo Cruz Foundation, Brazil), Theo Economou (University of Exeter)
High-Level Overview
• Development of a hierarchical Bayesian framework for nowcasting disease surveillance data with reporting delays and nested structures
• Application to Brazilian SARI (Severe Acute Respiratory Illness) hospitalisations and nested COVID-19 cases
• Joint modelling approach that links total SARI cases with COVID-positive subset to improve prediction accuracy
• Implementation challenges and solutions for operational disease surveillance systems
Problem Context
• SARI hospitalisations: individuals with fever and cough onset within 10 days requiring hospitalisation
• Reporting delays: cases occur in week T but are reported with delays of 0, 1, 2+ weeks
• Nested structure: proportion of SARI cases that test positive for COVID-19
• Challenge: COVID test results have unknown reporting delays (no timestamp when results are added to records)
Methodological Approach
Generalised Dirichlet Multinomial (GDM) Model
• Joint model for total counts (negative binomial) and partial counts (generalised Dirichlet multinomial)
• More flexible than standard multinomial approaches due to additional dispersion parameter • Better captures covariance structure in partial counts
• Implemented as series of beta binomial distributions for improved MCMC efficiency
Nested Structure Integration
• COVID cases modelled as proportion of total SARI hospitalisations using beta binomial distribution
• Key innovation: link between expected SARI trend and expected COVID proportion through shared parameter (δ)
• Intuition: waves in SARI often driven by COVID waves • Additional censoring layer to account for incomplete COVID reporting
Key Results
• Improved prediction precision compared to existing nowcasting approaches for COVID fatalities
• Joint modelling shows better performance than separate models, particularly for capturing trend dynamics
• Age covariate further improved COVID predictions (elderly more likely to be hospitalised for COVID)
• Model successfully captures both temporal trends and delay distributions in real data
Operational Considerations
Current Practice
• FIOCruz runs InfoGripe surveillance system using INLA (fast, 2-3 minutes runtime)
• Marginal model approach with simpler computational requirements
Proposed Framework Benefits
• More flexible model specification (handles non-standard distributions)
• Potentially more robust (INLA installation issues, struggles with sparse data)
• Enables joint modelling of multiple data streams
Implementation Challenges
• Computational cost: ~12 hours for rolling nowcast experiment vs. minutes for INLA
• Requires MCMC parameter specification and more programming
• Barrier for novel users compared to INLA’s simplicity
Software Development
NimbleCast R Package
• General package for fitting various nowcasting models
• Similar syntax to INLA to facilitate transitions
• Supports both GDM and simpler approaches
• Available on presenter’s GitHub (still in development)
Future Directions
• Extension to multiple viruses (RSV, influenza) alongside COVID within SARI framework
• Forecasting capabilities beyond nowcasting for preventive interventions
• Application to other nested surveillance structures:
- Viral variants within viruses
- Deaths as subset of cases
- Dengue cases (lab-confirmed subset of dengue-like symptoms)
• Potential pathogen interaction modelling with semi-mechanistic approaches
Technical Specifications
• Data: Brazilian SARI surveillance 2021-2024, 27 federal units
• Delays: up to 20 weeks for SARI, 30 weeks for COVID
• Window: 60-week rolling window for analysis • Implementation: R package Nimble for MCMC sampling
• Spatial effects: independent across federal units (extensible to spatiotemporal)
Discussion Points Raised
• Stability of COVID-SARI link over time and with changing epidemiology
• Potential for multiple pathogen interaction terms
• Window length optimisation for operational use
• Alternative approaches (e.g., downloading historic datasets to reconstruct delays, fitting to the counts with a joint model)
• Computational efficiency improvements (new sampling methods in development)