Include a simple reference model

johannes · 23 August 2022 13:26

I’ve discussed this with @samabbott and we think it would make sense to include a very simple reference model into the package. It should fulfil the following:

runs instantly
no MCMC or other fancy stuff that can go wrong
easy to understand
provides nowcast intervals
nowcasts not horrible

The KIT-simple_nowcast model I set up for covid19nowcasthub.de/ corresponds more or less to these criteria. See description here: hospitalization-nowcast-hub/kit-simple_nowcast.pdf at 6c2dbf3aa7614a5c9fbb3d837e164cac9a39025a · KITmetricslab/hospitalization-nowcast-hub · GitHub R code: hospitalization-nowcast-hub/code/baseline at main · KITmetricslab/hospitalization-nowcast-hub · GitHub Essentially it uses simple multiplication factors to get point nowcasts and then uses past nowcast errors to quantify the uncertainty (there’s a little twist in there in order to avoid discarding recent nowcast observation pairs).

What would need to be done is more or less the following:

adapt to same input format and other conventions of the package
make a bit more robust and general, e.g., add options for daily data vs seven-day sums
document properly

I’ll have a bit more time at my hands from September onwards, so I should be able to contribute this in the not-too-far future. As I’m not the most experienced at contributing to complex packages I might need some help there

samabbott · 23 August 2022 13:48

Our first post how exciting! Thanks for putting this out in public.

I think this is a really great idea.

As I mentioned @teojcryan and I were discussing a baseline model for his project and what we came up with in the end sounded awfully like a less nicely thought out version of your approach.

Given that and that I imagine people are often going to want to have a baseline model it makes a lot of sense to support this one. We have also seen it has pretty good performance in the Germany nowcasting hub so important not to minimise its potential as something to be used in its own right.

Some other things to consider are the following:

Should this be instead of or as well as the current epinowcast model. One option here would be to always produce a baseline and store it in the returned value from epinowcast for example. Alternatively, it could be a drop-in replacement. Whatever the choice it likely needs to work on its own for settings where it is good enough, Stan isn’t available or another model is being evaluated.
How do we integrate the output into the current plotting functionality? This depends a bit on the above in that we might want both nowcasts to be plotted but again likely needs to work on its own.
At some point we may want to support multiple nowcasting methods/back-ends. Do we want the baseline model to be an example of how to add one? I am thinking likely no at least for the first pass as somewhat an exception and also this would add quite a bit of complexity.

As @johannes says the first task is likely to integrate his baseline model with the preprocessing happening in the package and make its output link up with low-level plotting functionality etc. From there we can think about integration.

It sounds like some support with how to integrate in a package would be useful. If anyone wants to volunteer some time that would be great but also very happy to help.

samabbott · 24 August 2022 12:43

with a little more thought implementation wise having a function like add_baseline() might be nice that adds a baseline nowcast to an enw_preprocess_data object (and so also the output from epinowcast()). That just leaves integrating into plotting both on a low level and a high level (i.e developer functions vs the S3 interface).

adrianlison · 27 September 2022 12:52

Like this idea, both!

Another thing that could make sense to have explicitly is a “naïve nowcast”, i.e. simply not doing any nowcasting and just returning the cases observed up until present. If we extend epinowcast to forecasting capability, this can be turned into the naïve forecasting approach described by Hyndman and Athanasopoulos here.

I used something like this before to assess the added value of nowcasting vs. no nowcasting at all. Can also be used in some kind of scaled MAE.

samabbott · 29 September 2022 13:35

That is a great idea @adrianlison.

In terms of the baseline model the code is already very clean and functional. Thinking about this a bit more it seems like a shame to force users to install stan etc if all they want is to run this model. Instead of adding to epinowcast what do we think of making a new package called perhaps baseline (?) in the epinowcast org to host this?

In the first instance, it can be designed to work with the data processing as available in epinowcast but also to work on its own. This is probably our direction of travel anyway with it likely being sensible in the future to remove data pre- and postprocessing from the current package.

I am not immediately sure what this would mean for plotting etc (i.e would you make a plotting function in baseline with the idea of moving out in the future or adapt the output from the model to work with the current epinowcast plotting?

We could also add a naive model to this package (i.e do nothing) as @adrianlison suggests and have that in the same format which would make future general integration easier.

What do people think of this (in particular @johannes?). If this is the preferred option I can make some progress on packaging this and we can go from there. Plotting can be a last problem to resolve as most users can make their own plots (and likely will) and planning how to integrate lots of things that don’t yet exist is very hard.

samabbott · 30 September 2022 12:29

We had a short discussion of options here at today’s short notice meeting. Minutes here: Short-dated epinowcast meeting, 2022-09-30 - #6 by alison

pearsonca · 11 October 2022 15:52

Had a chat w/ @samabbott about how this overlapped with some work we have in SA (basically, generating varying synthetic time series corresponding to a few scenarios), and might be that this approach is better than the work that’s already been done. Will have a look and see about evaluating prospects of packaging it into something sleek for generating synthetic series.

kejohnson9 · 20 January 2025 12:32

Bringing this thread back to life as we have begun to scope out what a baseline (name TBD) R package might look like. I think the idea would be that this would act as a sort of companion package to epinowcast, using the epinowcast pre and post-processing but not requiring the installation of Stan.

The hope would for it to be a true baseline multiplicative model, to be used to compare to other methods that might have more extended features e.g. time varying delay estimation, hierarchical delay estimation across groups, weekday effects, or more mechanistic models that estimate latent infections and then apply reporting delays.

@johannes Would love to connect and get any feedback you might have on this proposal now that this post is a few years old. I’ve scoped out some technical specifications in this google doc Baseline Tech Specs - Google Docs

johannes · 20 January 2025 16:31

Hi @kejohnson9, I’d be happy to chat and get involved. I am not super proficient at writing R package and not overly familiar with the epinowcast code base, but it would be great to make some progress on this. Should we have a zoom call sometime soon?

My codes for a simple multiplier model are here: RESPINOW-Hub/code/baseline at main · KITmetricslab/RESPINOW-Hub · GitHub

kejohnson9 · 20 January 2025 16:48

Sounds great, yes let’s find a time and can loop in @samabbott as well as he has more of the historical context.

We’ve been going back and forth a bit on the google doc, let me know if you can’t access and feel free to leave any thoughts you might have.

Thanks for the code, I had been looking at the older version

samabbott · 20 January 2025 17:41

Excited to kick this back off. Johannes for some more context some collabs in the US have also been developing and using their own baseline model so the idea would be to fuse those together, make it into a tool that is really easy for others to use and write up an eval to guide them in doing so. Finally, we would do some work to support people in US deploy this (maybe with webR woooooo)

Code is looking pretty swanky these days I had remembered about the split datasets for estimation and nowcasting but not the location of the code.

jonathon.mellor · 20 January 2025 17:54

I have no specific suggestions, but just chipping in to say it would be great to see a reference model and keen to support in any way we can.

We (mostly Maria Tang) spent quite a lot of time the last few months going back and forth on different baseline approaches for our measles nowcast without a satisfying conclusion.

Part of the back-and-forth was trying to work out in the absence of a nowcast model what our epi colleagues would do to communicate the backfilling and trying to model that, but again no nice answer.

samabbott · 21 January 2025 10:03

Is there anything public domain on this or that you can communicate directly?

johannes · 21 January 2025 16:20

Great to see many people are interested in this What is the easiest way to agree on a meeting time? As we are split between the US East Coast and Europe I guess we’ll need a slot in the morning Eastern Time / afternoon Europe. Maybe it’s sufficient to vote via emojis. Here are my options the coming days:
Friday 24 January say 9am ET / 3pm Germany / 2pm UK
same on Tuesday 29 January
both work
I think this site only supports one emoji per person, so can’t propose more options right now

mariatang · 22 January 2025 09:57

Baselines considered:

no correction baseline (naïve but not realistically what epi colleagues would expect)
cut off last few weeks and repeat the previous week(s) (i.e. thinking that the data will be about the same)
cut off last few weeks and forecast (i.e. not trusting data in last few weeks, but assuming a continuing trend)
simple nowcast using a delay distribution convolution, based on simplest nowcast in NFIDD course (adjusting for delays but not any trends)

We decided to go for cut off and forecast (with a GAM) as a baseline for our measles nowcast, because we thought that might best represent what epi colleagues might try and do in their heads to adjust when looking at data without a nowcast

Also one of our PhD interns did a bit of a lit review on baselines in nowcasting literature, which I can send over via email

samabbott · 22 January 2025 11:11

This would be amazing.

Ah fun glad to see this getting used.

We decided to go for cut off and forecast

I think this is quite reasonable. Not sure how/if to include this type of approach in this evaluation though as it obviously requires a forecasting model (and assumes in some sense the end point is a forecast which it may not always be).

@johannes I can do any of those times.

johannes · 22 January 2025 12:52

I had to document this anyway for the Supplement of a paper where we are using it, so here’s a summary of how our simplistic method works in a standard case of weekly data and weekly data releases. A similar description is also available in the supplement of Wolffram et al, but it’s a bit tedious to read as we had to deal with sums over seven day windows.

On a side note, I think I may have an elegant solution to handle the issue with applying multiplication factors to zeros. But it’s essentially something a collaborator suggested for a related problem in a different project. So can’t share this here without checking back with them first.

kejohnson9 · 22 January 2025 14:44

I can do any of those times as well!

johannes · 22 January 2025 15:03

Great, then let’s just go with Friday 24 January 9am ET / 3pm Germany / 2pm UK. @samabbott is there some Zoom channel you usually use for meetings? Otherwise I can send a link.

kejohnson9 · 12 February 2025 18:51

Summary of action plan following this meeting:

Primary goal: build an R pakcage based on the multiplicative nowcast method @johannes set up for the German hospitalization nowcast, to address a public health need for an easy to use implementation of this method
Pecondary goal: to use this model as a baseline nowcast model for evaluating other nowcasting methods
- package should be flexible enough to allow users to estimate delay from one strata and apply it to another strata
- provide some sort of guidance on how to pre-process data into format the package needs, either through exported package helper functions or using other methods e.g. epinowcast package pre-processing
- UKHSA (@jonathon.mellor & @mariatang ) are willing to review as new releases are rolled out, test on their data, and eventually integrate into their model evaluation framework for nowcasting
- handling of zero values: still to be discussed
- @kejohnson9 & @samabbott & @sbfnk to lead on the development, with @johannes to review as desired and as a resource if stuck
Analysis plan: still to be ironed out, plan to evaluate under different data conditions via simulation applied to real-data + real-world data examples, for different specifications of the model. To put in context of other models, can compare to nowcasts produced as part of German Forecast Hub described in Wolffram et al. Goal would be to write into a publication.

Topic		Replies	Views
Streamlining of epi modeling tools	12	76	14 August 2024
Data management recommendations for nowcasting Projects	12	567	7 October 2022
Community Seminar 2024-08-07 - Kaitlyn Johnson - Wastewater modeling to forecast hospital admissions in the US: Challenges and opportunities Meetings	19	132	14 August 2024
How can collaborative infectious disease forecasting/nowcasting projects be improved?	6	507	5 June 2023
Nowcasting in a real-time analysis pipeline to estimate the effective reproduction number with missing data and reporting delay Publicity simulated-case-study , effective-reproducti , missing-data	0	396	24 August 2022

Include a simple reference model

Related topics