Update: a simple reference model for nowcasting

Following up from this first (!) epinowcast forum post with a bit of a project update. The baselinenowcast R package now contains low-level modular functions that can be used to produce nowcasts from a reporting triangle. Check out the pre-print where we apply the method and blog post describing the contribution.

My summary of what we did is below:

  1. Verified the outputs matched that of the KIT simple nowcast method (see @dwolffram Wolffram et al. for more details), since this is what the method in baselinenowcast is based upon. We used the German COVID-19 Nowcast Hub codebase and ran our method and theirs on the pre-processed reporting triangle. In this process, we found a bug in the implementation of the uncertainty estimation, fixed it in our fork, and again ran their method. The ones produced by our method and the revised version of their codebase were virtually identical, and we present these results and the results compared to the original implementation in the paper.
  2. We built upon (1) to see how other method specifications of the baselinenowcast method would perform. The package’s modularity makes it easy to modify things like training volume size, pass estimates across strata, perform weekday specific estimates and combine them, and use only complete reporting triangles. We tested out all of these on this data and compared them to the default package specifications, which were based on what was used in (1) (and the KIT simple nowcast Hub submission). We found that both increasing the training data volume and performing weekday specific nowcasts and combining them improved nowcast performance on this particular dataset, whereas, to our surprise, using data combined across all age groups for delay estimation reduced performance. The improvements made sense when looking at the delay distributions stratified by age group and time – delays shifted but not rapidly so, and there were consistent weekday effects in reporting. Same with the reductions in performance from using delays from the combined data– there was clearly a shorter delay in lower age groups, and thus, using a longer delay led to overprediction.
  3. We (and by we I mean co-authors @mariatang and @jonathon.mellor ) ran three method specifications of the baselinenowcast method on the data from their recent norovirus nowcasting paper and compared the performance of the baselinenowcast methods to the baseline method they originally used, the epinowcast model they implemented, and the GAM they developed in house. We found that baselinenowcast performed significantly better than the original baseline used, and helped us understand more specifically the ways in which the GAM and epinowcast models improved performance.

My personal takeaways from this work:

  • Developing a method/package while simultaneously performing an analysis using it was, at times difficult (i.e. I kept breaking my own analysis code), but overall extremely helpful. The verification against the KIT simple nowcast method, which was only possible due to how easy to run and well-maintained @dwolffram and @johannes and co-authors codebase was, helped us catch a number of mistakes I originally made in the implementation, and then, helped us catch the inconsistency in the original implementation and the mathematical description of the method.

  • Continuing in the pattern of catching my own faulty first pass implementations, the analysis of the different method specifications and application to the two datasets didn’t immediately work the first time we tried each of them – specifically the weekday effects and the way the method handled sparsely populated data with lots of zeros. The process of needing to apply the package to real data meant that we identified these early and were able to think through how to address them in the package. I’m sure issues will continue to crop up as others try out using the package, but it was a great stress test (and points to the need for more comprehensive, approximately real-word unit and end-to-end tests within the package).

  • Somewhat already mentioned above, but the whole set of applications was only possible because the authors ensured that their publications had publicly available data, with code that was reproducible and easy for someone from outside the project to run. Without this, none of the applications would have been possible and likely the actual implementation within the package would have been way harder (as I personally learn by doing, and the method developed by @johannes and @dwolffram made a lot more sense to me once I could dig into their code).

  • Generalizable software implementations can help catch bugs and prevent propagation of errors, as we saw with the original implementation of the KIT simple nowcast method (though this was already fixed in subsequent codebases).

  • Lastly, this project gave me a bit of a resurgence of hope that working across academia and public health institutions can be really fruitful and productive. This project was incredibly collaborative, with as mentioned @jonathon.mellor and @mariatang from UKHSA using the package to produce nowcasts. @johannes worked with us at all stages to help make sure the method was implemented and communicated correctly. Emily Tsyzka, Laura Jones and Rosa Ergas from Massachusetts Department of Health alongside @nickreich, after developing and implementing a very similar implementation for their own nowcasting, worked with us throughout the development of the package and analysis to ensure that the method was communicated clearly. We are now planning subsequent vignettes and analyses to ensure adoption in the U.S. public health context and to demonstrate the importance in this context, of which I am very excited for.

Links and such

  • baselinenowcast package GitHub is here, would appreciate any and all feedback as mentioned we still have to write the full user interface (planning for a wrapper baselinenowcast() function with different option to implement the epi-specific configurations described in the paper)
  • Paper Github – let us know if anything can be clearer. We want to make this as reproducible and easy to work with as @johaness and @jonathon.mellor and @mariatang ā€˜s codebases were.
  • Blog post on epinowcast forum (would love feedback on some of these schematics – communication of this ā€œsimpleā€ method is a challenge!)
  • Sam’s Bluesky post. I should do one of these put social media scares me so might remain a lurker…

Looking forward to feedback and thanks again to all co-authors @johannes @mariatang @jonathon.mellor @sbfnk @samabbott @nickreich @dwolffram @barborasobolova

2 Likes

Thanks for writing this @kejohnson9 some really useful reflections in here which I agree with. I also agree its been a really nice model for future collaborations!

Two questions:

What things would you/we have done differently with this project with the benefit of hindsight?

What is the most interesting extension to this work that isn’t application-based? If someone was going to do more research in this direction what do you think some productive things to think about are?

Unfortunately, this got desk rejected by PLOS comp b as out of scope :frowning:

Anyone got any good ideas for alternative non-profit/ethical journals that won’t require a major rework?

I was quite surprised by this as I thought it was pretty in scope (see 2025-08-26 Plos Comp B Scope? - Sam Abbott's notes ) so if anyone has any thoughts on that would love to hear them. We didn’t get any direct feedback when rejected, so a bit in the dark.

Something I am seeing here is that maybe the abstract didn’t include enough focus on the novel aspects of the work?

As flagged here @seabbs.bsky.social on Bluesky I think there is a real gap for an infectious disease modelling journal that is non-profit/ethical (or maybe one exists I don’t know about?). Does anyone else agree and if so is there a path towards that?

This all reminded me of several conversations a while ago about doing a peer community in kind of journal thing (its a confusing platform for infectious disease modelling): @seabbs.bsky.social on Bluesky

This still seems to be a good idea to me but would definitely take a fair bit of effort from a fair few people.

Things I would have done differently:

  1. Proactively integrated epidemiologically relevant datasets for test-driven development. We eventually got to this point (though there is still lots of room for improving and extending the current testing suite), because of the analyses we did using the COVID and norovirus data, but it would have been better to have been proactive about these as we wasted time fixing both the implementation bugs and the tests in the process.
  2. I think we should have done more EDA on both of the datasets prior to determining and running the analyses, which would have helped us identify the method specifications with the expected performance gain, and then seen if these hypotheses played out.
  3. I would have started with more modularity in the design process of the package from the beginning, which would have made the uncertainty estimation a bit easier to construct in different ways.

Most interesting extension of this work (non-application based):
From a methods perspective there are still a few major limitations of the chainladder approach, including

  1. Handling of zero initial counts at a particular reference time
  2. Handling only 0s in the initial reference times used for the delay estimate
  3. Making estimates with very limited initial reference times.

(1) is handled in baselinenowcast with the zero-handling method described in the Supplement and in the package docs (which is based on work in preparation from Morgenstern et al 2025). (2) and (3) are still existing limitations, with workarounds for (2) handled by including a larger number of reference times in the historical dataset to ensure that there are not only 0s in any of the initial delays and workarounds for (3) handled by truncating the length of the delay such that it is one less than the number of reference times available.

However, future work could focus on making similarly reasonable approximations as is described in (1) and implemented in the current package. This would reduce the limitations of the current method and provide fewer potential hiccups when applying the method to very early outbreaks with very low numbers of initial cases.