New paper looking at joint truncation adjustment and Rt estimation

samabbott · 9 January 2025 11:06

See https://ora.ox.ac.uk/objects/uuid:27894ea6-f598-48ca-b3d8-b782b1bdc7a8/files/sf4752j85z

From a skim the idea seems very similar to the approach used in @adrianlison Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates which is now part of epinowcast (Estimating the effective reproduction number in real-time for a single timeseries with reporting delays • epinowcast). I don’t see a discussion of that, of the related EpiNow2 truncation handling (which isn’t joint but is uncertain - Examples: estimate_infections() • EpiNow2), or similar work from another Oxford group (GitHub - BDI-pathogens/EpiLine: Bayesian Modelling in Epidemiology Case Line Lists) so good to collect peoples notes here.

The paper mentions a package but I can’t find it so perhaps it is accidently set to private?

samabbott · 9 January 2025 11:19

A difference I see here from a lot of our work and the nowcasting literature is assuming final reported cases are known without reporting error. This approach allows the use of a binomial observation model. Instinctively I think this approach is less representative than the Poisson/NG sum approach but might be interesting to support as an option (I assume it has different failure modes which could be useful). This difference might be related to some of the work @johannes and co are doing looks at nowcasting obs models.

Some other apparent differences:

only truncation handling and doesn’t also have support for latent reporting delays.
uses a priori fixed piecewise constant model for all time evolving parameters vs allowing this to be fit by the model.
it looks like the main approach is using a hand rolled gibbs approach
its treating I as an integer (which is what motivates the gibbs approach) vs propagating expectations
it looks like it is ignoring primary censoring of the delay distributions

So after having read I think the features that aren’t a subset of epinowcast are:

treating final counts as known and so using a binomial of beta-binomial obs model
treating I as a discrete parameter.

I think for the latter we want to avoid at least until the Julia framework is in place so we can do mixed NUTs and gibbs using robust tools but we could look at moving up work on approximating this with uncertainty on I that is approx Poisson (which @adrianlison looked at I know and came away not liking).

As I said above I am not sure that treating final counts as known totally makes sense but it is in the roadmap for non-joint delay estimation and so it could be quite easy to support these obs models (I would ideally see a comparison of the trade-offs before we commit to the dev work).

adrianlison · 12 February 2025 13:43

Thanks @samabbott for the summary! But I think that the model in this paper is actually very similar to what we have been doing, and the background is that the two points

treating final counts as known and so using a binomial of beta-binomial obs model

treating I as a discrete parameter

are actually directly related. The paper models a sequential Binomial sampling of reports, with success probabilities given by the reporting hazard. My understanding is that this is equivalent to a multinomial distribution of reports where the number of trials is I(t) (total number of cases) and the reporting probabilities can be derived from the hazards (as we do in epinowcast).

Now if you assume that I(t) is Poisson distributed with mean E[I(t)], then by the properties of the Multinomial, the reports (C(t' | t'') - C(t | t') in the paper notation) are also individually Poisson distributed with mean E[I(t)] * p (where p is the reporting probability, not hazard), which is equivalent to what we are modeling in epinowcast. This also generalizes to the negative binomial, see an earlier post by @johannes (Some thoughts on parameterization of negative binomials).

This means that given the E[I(t)], and assuming independent Poisson noise, the reporting model in the paper is equivalent to what we are doing. The difference depends on how we interpret the infection model in epinowcast:

If we interpret our model as having a deterministic infection process, then the Poisson noise in case numbers must come from reporting errors, so the difference is indeed that we are modeling reporting errors but no infection noise in turn.
If we interpret our model as approximating a stochastic infection process, i.e. the number of infections is itself Poisson or Negative Binomial distributed, then we are also modeling no reporting errors. In this case, the only difference between epinowcast and the model in the paper is that by explicitly sampling the I(t), they take into account the dependence in infection noise over time, which is something we ignore when only sampling E[I(t)].

In the implementation for Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates I tried to approximate the infection noise using a continuous distribution, but still used Poisson distributed cell counts, which basically means that I assumed both infection noise + reporting errors, i.e. even more variation. Not sure if this is justified, but this was the only viable option I could implement in stan. My experience with the models I have tried is that the differences in estimates are typically quite small especially when case numbers are not extremely low, but that the sampling speed of HMC can differ a lot (I think this has to do with the autocorrelation in the infection process).

adrianlison · 12 February 2025 13:49

Just a side remark, I found it interesting that they used stan for one of the steps in their Gibbs sampling algorithm. Sounds cool but also painful in a way…

Also, I think that the skyline / fixed piecewise constant model for R_t is an important limitation. In the tested examples it was always correctly aligned with changepoints, but they discuss that for real-world examples it could become an issue. Of course this can be reduced by shortening the intervals but my guess is that when they get too short the efficiency of the Gibbs sampler will brake down. Especially when you get into a regime where the R_t intervals are much shorter than the generation time.

Topic		Replies	Views
New {epinowcast} vignette - Estimating the effective reproduction number jointly with correcting for right truncation - feedback and discussion thread Modeling application	0	168	6 September 2023
Nowcasting in a real-time analysis pipeline to estimate the effective reproduction number with missing data and reporting delay Publicity simulated-case-study , effective-reproducti , missing-data	0	395	24 August 2022
Propagating uncertainty between models in EpiNow2 Other Resources application	1	487	23 February 2023
Modelling censored primary observations in a discrete time nowcasting model Developers model-extension , researcher	7	281	17 July 2023
Spatial nowcasting of Rt using a marginal Laplace approximation Project Proposals model-extension	8	126	18 June 2024

New paper looking at joint truncation adjustment and Rt estimation

Related topics