Following up from this first (!) epinowcast forum post with a bit of a project update. The baselinenowcast R package now contains low-level modular functions that can be used to produce nowcasts from a reporting triangle. Check out the pre-print where we apply the method and blog post describing the contribution.
My summary of what we did is below:
- Verified the outputs matched that of the KIT simple nowcast method (see @dwolffram Wolffram et al. for more details), since this is what the method in baselinenowcast is based upon. We used the German COVID-19 Nowcast Hub codebase and ran our method and theirs on the pre-processed reporting triangle. In this process, we found a bug in the implementation of the uncertainty estimation, fixed it in our fork, and again ran their method. The ones produced by our method and the revised version of their codebase were virtually identical, and we present these results and the results compared to the original implementation in the paper.
- We built upon (1) to see how other method specifications of the baselinenowcast method would perform. The packageās modularity makes it easy to modify things like training volume size, pass estimates across strata, perform weekday specific estimates and combine them, and use only complete reporting triangles. We tested out all of these on this data and compared them to the default package specifications, which were based on what was used in (1) (and the KIT simple nowcast Hub submission). We found that both increasing the training data volume and performing weekday specific nowcasts and combining them improved nowcast performance on this particular dataset, whereas, to our surprise, using data combined across all age groups for delay estimation reduced performance. The improvements made sense when looking at the delay distributions stratified by age group and time ā delays shifted but not rapidly so, and there were consistent weekday effects in reporting. Same with the reductions in performance from using delays from the combined dataā there was clearly a shorter delay in lower age groups, and thus, using a longer delay led to overprediction.
- We (and by we I mean co-authors @mariatang and @jonathon.mellor ) ran three method specifications of the baselinenowcast method on the data from their recent norovirus nowcasting paper and compared the performance of the baselinenowcast methods to the baseline method they originally used, the epinowcast model they implemented, and the GAM they developed in house. We found that baselinenowcast performed significantly better than the original baseline used, and helped us understand more specifically the ways in which the GAM and epinowcast models improved performance.
My personal takeaways from this work:
-
Developing a method/package while simultaneously performing an analysis using it was, at times difficult (i.e. I kept breaking my own analysis code), but overall extremely helpful. The verification against the KIT simple nowcast method, which was only possible due to how easy to run and well-maintained @dwolffram and @johannes and co-authors codebase was, helped us catch a number of mistakes I originally made in the implementation, and then, helped us catch the inconsistency in the original implementation and the mathematical description of the method.
-
Continuing in the pattern of catching my own faulty first pass implementations, the analysis of the different method specifications and application to the two datasets didnāt immediately work the first time we tried each of them ā specifically the weekday effects and the way the method handled sparsely populated data with lots of zeros. The process of needing to apply the package to real data meant that we identified these early and were able to think through how to address them in the package. Iām sure issues will continue to crop up as others try out using the package, but it was a great stress test (and points to the need for more comprehensive, approximately real-word unit and end-to-end tests within the package).
-
Somewhat already mentioned above, but the whole set of applications was only possible because the authors ensured that their publications had publicly available data, with code that was reproducible and easy for someone from outside the project to run. Without this, none of the applications would have been possible and likely the actual implementation within the package would have been way harder (as I personally learn by doing, and the method developed by @johannes and @dwolffram made a lot more sense to me once I could dig into their code).
-
Generalizable software implementations can help catch bugs and prevent propagation of errors, as we saw with the original implementation of the KIT simple nowcast method (though this was already fixed in subsequent codebases).
-
Lastly, this project gave me a bit of a resurgence of hope that working across academia and public health institutions can be really fruitful and productive. This project was incredibly collaborative, with as mentioned @jonathon.mellor and @mariatang from UKHSA using the package to produce nowcasts. @johannes worked with us at all stages to help make sure the method was implemented and communicated correctly. Emily Tsyzka, Laura Jones and Rosa Ergas from Massachusetts Department of Health alongside @nickreich, after developing and implementing a very similar implementation for their own nowcasting, worked with us throughout the development of the package and analysis to ensure that the method was communicated clearly. We are now planning subsequent vignettes and analyses to ensure adoption in the U.S. public health context and to demonstrate the importance in this context, of which I am very excited for.
Links and such
- baselinenowcast package GitHub is here, would appreciate any and all feedback as mentioned we still have to write the full user interface (planning for a wrapper
baselinenowcast()
function with different option to implement the epi-specific configurations described in the paper) - Paper Github ā let us know if anything can be clearer. We want to make this as reproducible and easy to work with as @johaness and @jonathon.mellor and @mariatang ās codebases were.
- Blog post on epinowcast forum (would love feedback on some of these schematics ā communication of this āsimpleā method is a challenge!)
- Samās Bluesky post. I should do one of these put social media scares me so might remain a lurkerā¦
Looking forward to feedback and thanks again to all co-authors @johannes @mariatang @jonathon.mellor @sbfnk @samabbott @nickreich @dwolffram @barborasobolova