Preprint: Overconfident estimates of the reproductive number from the Poisson renewal equation method

Hello everyone,

I would like to bring some attention to a new post at the Epinowcast blog, that in turn summarizes a preprint discussing a shortcoming of the Poisson renewal equation method for estimating the effective reproductive number, R_t. It can be shown that this method underestimates the uncertainty of the estimate by a certain amount, when the observed data are overdispersed.

This issue is not entirely unknown, but hasn’t been pointed out explicitly. Additionally, the Poisson renewal equation remains one of the most popular tools for R_t estimation, so we believe raising some awareness is beneficial.

The paper is currently under review, so hopefully it’s on the right track to be published. Also big shout-out to the co-authors – @jessalyn, @johannes, Isaac Goldstein and Volodymyr Minin.

Abstract

Time-varying effective reproductive numbers of infectious diseases are commonly estimated using renewal equation models. In the widely applied R package EpiEstim and various related tools, this approach is combined with a Poisson distributional assumption. This has been criticized on various occasions, mostly on grounds of general model realism or a desire to estimate overdispersion parameters. Here we argue that an important issue arising from the Poisson assumption is that inference about the effective reproductive number becomes overconfident in presence of overdispersion. By how much standard errors are underestimated follows in a straightforward manner from theory on generalized linear models. We therefore recommend to replace the Poisson assumption by quasi-Poisson or negative binomial extensions, and contrast their respective properties. We illustrate our arguments in detailed simulation studies and three examples of case studies of Ebola, pandemic influenza and COVID-19.

3 Likes

Thanks for this @barborasobolova, I found the blog format for this very useful.

For those that haven’t seen it one of the references from this rtglm is a nice write up of the idea that renewal models can be expressed as GAMs/GLMs (though unfortunately it misses that quite a few people have been doing this for a while).

Is that documented/citable anywhere?

That is a good point. I suppose @jonathon.mellor has some papers around this and people at the CFA have been doing it for a few years. There were also some COVID Rt esitmators that did this but I can’t remember what any of them are called.

Other than than I am not so sure - its not directly discussed in the hhh4 literature?

Hm hhh4 and Rt has always been a bit tricky as it’s usually applied to weekly data and many serial intervals are shorter than that. So it’s lumping generations together anyway and is not always easy to interpret mechanistically in practice. Also, with multivariate models you need to start thinking about next generation matrices and so on.

1 Like

Applied to but not restricted to using in that setting? I thought that it was the same in many cases as a renewal with importation (with additional stuff going on from the mixing) and that all of that could be expressed as a glm?

That being said I haven’t thought about it for a long time or spent any serious time on it so perhaps that is just fancy.

Great blog @barborasobolova ! Really nice to have the paper to reference in future, we have intuitively been using NBs in our work, but extremely useful to have the benefits so clearly displayed.

Reading that Rtglm paper was interesting. I do feel we have done a lot of work that is adjacent to the subject of that paper. While we have rarely produced Rt estimates for outbreaks and written it up, we have done a lot of work on GAM based nowcasting and growth rate estimation.

I’d highlight that this is intentional on our part - not doing Rt and doing rt instead. In an outbreak, or for an endemic (e.g. winter) pathogen our customers do not care about Rt (I’ve asked if they want it and they say no!). So we produce growth rates, often for different population strata which again makes us not want to produce a combined Rt, which needs a next generation approach as @johannes highlights (it would be great if someone implemented this somewhere… The time-dependent reproduction number for epidemics in heterogeneous populations | Journal of The Royal Society Interface | The Royal Society).

Now, for an emerging pathogen or where interventions are in play then Rt is more interesting to our users, but that’s (thankfully) a fraction of times we do responsive modelling. Its important methods are good for doing this!

I suppose I’m glad in a way someone’s published an Rt GAM, but probably slightly disappointed we weren’t referenced or I didn’t just do it myself at some point. Alas.

I’d note that the RtGLM has high overlap with what’s going on here, with a broader tooling set: ai4ci/ggoutbreak: A basic framework and uncomplicated set of statistical models for describing timeseries data, including incidence, proportions, exponential growth rates