Include a simple reference model

I’ve discussed this with @samabbott and we think it would make sense to include a very simple reference model into the package. It should fulfil the following:

  • runs instantly
  • no MCMC or other fancy stuff that can go wrong
  • easy to understand
  • provides nowcast intervals
  • nowcasts not horrible

The KIT-simple_nowcast model I set up for covid19nowcasthub.de/ corresponds more or less to these criteria. See description here: hospitalization-nowcast-hub/kit-simple_nowcast.pdf at 6c2dbf3aa7614a5c9fbb3d837e164cac9a39025a · KITmetricslab/hospitalization-nowcast-hub · GitHub R code: hospitalization-nowcast-hub/code/baseline at main · KITmetricslab/hospitalization-nowcast-hub · GitHub Essentially it uses simple multiplication factors to get point nowcasts and then uses past nowcast errors to quantify the uncertainty (there’s a little twist in there in order to avoid discarding recent nowcast observation pairs).

What would need to be done is more or less the following:

  • adapt to same input format and other conventions of the package
  • make a bit more robust and general, e.g., add options for daily data vs seven-day sums
  • document properly

I’ll have a bit more time at my hands from September onwards, so I should be able to contribute this in the not-too-far future. As I’m not the most experienced at contributing to complex packages I might need some help there :wink:

2 Likes

Our first post how exciting! Thanks for putting this out in public.

I think this is a really great idea.

As I mentioned @teojcryan and I were discussing a baseline model for his project and what we came up with in the end sounded awfully like a less nicely thought out version of your approach.

Given that and that I imagine people are often going to want to have a baseline model it makes a lot of sense to support this one. We have also seen it has pretty good performance in the Germany nowcasting hub so important not to minimise its potential as something to be used in its own right.

Some other things to consider are the following:

  • Should this be instead of or as well as the current epinowcast model. One option here would be to always produce a baseline and store it in the returned value from epinowcast for example. Alternatively, it could be a drop-in replacement. Whatever the choice it likely needs to work on its own for settings where it is good enough, Stan isn’t available or another model is being evaluated.
  • How do we integrate the output into the current plotting functionality? This depends a bit on the above in that we might want both nowcasts to be plotted but again likely needs to work on its own.
  • At some point we may want to support multiple nowcasting methods/back-ends. Do we want the baseline model to be an example of how to add one? I am thinking likely no at least for the first pass as somewhat an exception and also this would add quite a bit of complexity.

As @johannes says the first task is likely to integrate his baseline model with the preprocessing happening in the package and make its output link up with low-level plotting functionality etc. From there we can think about integration.

It sounds like some support with how to integrate in a package would be useful. If anyone wants to volunteer some time that would be great but also very happy to help.

with a little more thought implementation wise having a function like add_baseline() might be nice that adds a baseline nowcast to an enw_preprocess_data object (and so also the output from epinowcast()). That just leaves integrating into plotting both on a low level and a high level (i.e developer functions vs the S3 interface).

Like this idea, both!

Another thing that could make sense to have explicitly is a “naïve nowcast”, i.e. simply not doing any nowcasting and just returning the cases observed up until present. If we extend epinowcast to forecasting capability, this can be turned into the naïve forecasting approach described by Hyndman and Athanasopoulos here.

I used something like this before to assess the added value of nowcasting vs. no nowcasting at all. Can also be used in some kind of scaled MAE.

1 Like

That is a great idea @adrianlison.

In terms of the baseline model the code is already very clean and functional. Thinking about this a bit more it seems like a shame to force users to install stan etc if all they want is to run this model. Instead of adding to epinowcast what do we think of making a new package called perhaps baseline (?) in the epinowcast org to host this?

In the first instance, it can be designed to work with the data processing as available in epinowcast but also to work on its own. This is probably our direction of travel anyway with it likely being sensible in the future to remove data pre- and postprocessing from the current package.

I am not immediately sure what this would mean for plotting etc (i.e would you make a plotting function in baseline with the idea of moving out in the future or adapt the output from the model to work with the current epinowcast plotting?

We could also add a naive model to this package (i.e do nothing) as @adrianlison suggests and have that in the same format which would make future general integration easier.

What do people think of this (in particular @johannes?). If this is the preferred option I can make some progress on packaging this and we can go from there. Plotting can be a last problem to resolve as most users can make their own plots (and likely will) and planning how to integrate lots of things that don’t yet exist is very hard.

We had a short discussion of options here at today’s short notice meeting. Minutes here: Short-dated epinowcast meeting, 2022-09-30 - #6 by alison

Had a chat w/ @samabbott about how this overlapped with some work we have in SA (basically, generating varying synthetic time series corresponding to a few scenarios), and might be that this approach is better than the work that’s already been done. Will have a look and see about evaluating prospects of packaging it into something sleek for generating synthetic series.

1 Like

Bringing this thread back to life as we have begun to scope out what a baseline (name TBD) R package might look like. I think the idea would be that this would act as a sort of companion package to epinowcast, using the epinowcast pre and post-processing but not requiring the installation of Stan.

The hope would for it to be a true baseline multiplicative model, to be used to compare to other methods that might have more extended features e.g. time varying delay estimation, hierarchical delay estimation across groups, weekday effects, or more mechanistic models that estimate latent infections and then apply reporting delays.

@johannes Would love to connect and get any feedback you might have on this proposal now that this post is a few years old. I’ve scoped out some technical specifications in this google doc Baseline Tech Specs - Google Docs

1 Like

Hi @kejohnson9, I’d be happy to chat and get involved. I am not super proficient at writing R package and not overly familiar with the epinowcast code base, but it would be great to make some progress on this. Should we have a zoom call sometime soon?

My codes for a simple multiplier model are here: RESPINOW-Hub/code/baseline at main · KITmetricslab/RESPINOW-Hub · GitHub

1 Like

Sounds great, yes let’s find a time and can loop in @samabbott as well as he has more of the historical context.

We’ve been going back and forth a bit on the google doc, let me know if you can’t access and feel free to leave any thoughts you might have.

Thanks for the code, I had been looking at the older version

1 Like

Excited to kick this back off. Johannes for some more context some collabs in the US have also been developing and using their own baseline model so the idea would be to fuse those together, make it into a tool that is really easy for others to use and write up an eval to guide them in doing so. Finally, we would do some work to support people in US deploy this (maybe with webR woooooo)

Code is looking pretty swanky these days I had remembered about the split datasets for estimation and nowcasting but not the location of the code.

I have no specific suggestions, but just chipping in to say it would be great to see a reference model and keen to support in any way we can.

We (mostly Maria Tang) spent quite a lot of time the last few months going back and forth on different baseline approaches for our measles nowcast without a satisfying conclusion.

Part of the back-and-forth was trying to work out in the absence of a nowcast model what our epi colleagues would do to communicate the backfilling and trying to model that, but again no nice answer.

2 Likes

Is there anything public domain on this or that you can communicate directly?

1 Like

Great to see many people are interested in this :slightly_smiling_face: What is the easiest way to agree on a meeting time? As we are split between the US East Coast and Europe I guess we’ll need a slot in the morning Eastern Time / afternoon Europe. Maybe it’s sufficient to vote via emojis. Here are my options the coming days:
:snowflake: Friday 24 January say 9am ET / 3pm Germany / 2pm UK
:sunny: same on Tuesday 29 January
:2nd_place_medal: both work
I think this site only supports one emoji per person, so can’t propose more options right now :upside_down_face:

Baselines considered:

  • no correction baseline (naïve but not realistically what epi colleagues would expect)
  • cut off last few weeks and repeat the previous week(s) (i.e. thinking that the data will be about the same)
  • cut off last few weeks and forecast (i.e. not trusting data in last few weeks, but assuming a continuing trend)
  • simple nowcast using a delay distribution convolution, based on simplest nowcast in NFIDD course (adjusting for delays but not any trends)

We decided to go for cut off and forecast (with a GAM) as a baseline for our measles nowcast, because we thought that might best represent what epi colleagues might try and do in their heads to adjust when looking at data without a nowcast

Also one of our PhD interns did a bit of a lit review on baselines in nowcasting literature, which I can send over via email

2 Likes

This would be amazing.

Ah fun glad to see this getting used.

We decided to go for cut off and forecast

I think this is quite reasonable. Not sure how/if to include this type of approach in this evaluation though as it obviously requires a forecasting model (and assumes in some sense the end point is a forecast which it may not always be).

@johannes I can do any of those times.

I had to document this anyway for the Supplement of a paper where we are using it, so here’s a summary of how our simplistic method works in a standard case of weekly data and weekly data releases. A similar description is also available in the supplement of Wolffram et al, but it’s a bit tedious to read as we had to deal with sums over seven day windows.

On a side note, I think I may have an elegant solution to handle the issue with applying multiplication factors to zeros. But it’s essentially something a collaborator suggested for a related problem in a different project. So can’t share this here without checking back with them first.



2 Likes

I can do any of those times as well!

Great, then let’s just go with Friday 24 January 9am ET / 3pm Germany / 2pm UK. @samabbott is there some Zoom channel you usually use for meetings? Otherwise I can send a link.

1 Like