I’ve discussed this with @samabbott and we think it would make sense to include a very simple reference model into the package. It should fulfil the following:
- runs instantly
- no MCMC or other fancy stuff that can go wrong
- easy to understand
- provides nowcast intervals
- nowcasts not horrible
The KIT-simple_nowcast model I set up for covid19nowcasthub.de/ corresponds more or less to these criteria. See description here: hospitalization-nowcast-hub/kit-simple_nowcast.pdf at 6c2dbf3aa7614a5c9fbb3d837e164cac9a39025a · KITmetricslab/hospitalization-nowcast-hub · GitHub R code: hospitalization-nowcast-hub/code/baseline at main · KITmetricslab/hospitalization-nowcast-hub · GitHub Essentially it uses simple multiplication factors to get point nowcasts and then uses past nowcast errors to quantify the uncertainty (there’s a little twist in there in order to avoid discarding recent nowcast observation pairs).
What would need to be done is more or less the following:
- adapt to same input format and other conventions of the package
- make a bit more robust and general, e.g., add options for daily data vs seven-day sums
- document properly
I’ll have a bit more time at my hands from September onwards, so I should be able to contribute this in the not-too-far future. As I’m not the most experienced at contributing to complex packages I might need some help there
Our first post how exciting! Thanks for putting this out in public.
I think this is a really great idea.
As I mentioned @teojcryan and I were discussing a baseline model for his project and what we came up with in the end sounded awfully like a less nicely thought out version of your approach.
Given that and that I imagine people are often going to want to have a baseline model it makes a lot of sense to support this one. We have also seen it has pretty good performance in the Germany nowcasting hub so important not to minimise its potential as something to be used in its own right.
Some other things to consider are the following:
- Should this be instead of or as well as the current
epinowcast model. One option here would be to always produce a baseline and store it in the returned value from
epinowcast for example. Alternatively, it could be a drop-in replacement. Whatever the choice it likely needs to work on its own for settings where it is good enough, Stan isn’t available or another model is being evaluated.
- How do we integrate the output into the current plotting functionality? This depends a bit on the above in that we might want both nowcasts to be plotted but again likely needs to work on its own.
- At some point we may want to support multiple nowcasting methods/back-ends. Do we want the baseline model to be an example of how to add one? I am thinking likely no at least for the first pass as somewhat an exception and also this would add quite a bit of complexity.
As @johannes says the first task is likely to integrate his baseline model with the preprocessing happening in the package and make its output link up with low-level plotting functionality etc. From there we can think about integration.
It sounds like some support with how to integrate in a package would be useful. If anyone wants to volunteer some time that would be great but also very happy to help.
with a little more thought implementation wise having a function like
add_baseline() might be nice that adds a baseline nowcast to an
enw_preprocess_data object (and so also the output from
epinowcast()). That just leaves integrating into plotting both on a low level and a high level (i.e developer functions vs the S3 interface).
Like this idea, both!
Another thing that could make sense to have explicitly is a “naïve nowcast”, i.e. simply not doing any nowcasting and just returning the cases observed up until present. If we extend epinowcast to forecasting capability, this can be turned into the naïve forecasting approach described by Hyndman and Athanasopoulos here.
I used something like this before to assess the added value of nowcasting vs. no nowcasting at all. Can also be used in some kind of scaled MAE.
That is a great idea @adrianlison.
In terms of the baseline model the code is already very clean and functional. Thinking about this a bit more it seems like a shame to force users to install stan etc if all they want is to run this model. Instead of adding to
epinowcast what do we think of making a new package called perhaps
baseline (?) in the
epinowcast org to host this?
In the first instance, it can be designed to work with the data processing as available in
epinowcast but also to work on its own. This is probably our direction of travel anyway with it likely being sensible in the future to remove data pre- and postprocessing from the current package.
I am not immediately sure what this would mean for plotting etc (i.e would you make a plotting function in
baseline with the idea of moving out in the future or adapt the output from the model to work with the current
We could also add a naive model to this package (i.e do nothing) as @adrianlison suggests and have that in the same format which would make future general integration easier.
What do people think of this (in particular @johannes?). If this is the preferred option I can make some progress on packaging this and we can go from there. Plotting can be a last problem to resolve as most users can make their own plots (and likely will) and planning how to integrate lots of things that don’t yet exist is very hard.
We had a short discussion of options here at today’s short notice meeting. Minutes here: Short-dated epinowcast meeting, 2022-09-30 - #6 by alison
Had a chat w/ @samabbott about how this overlapped with some work we have in SA (basically, generating varying synthetic time series corresponding to a few scenarios), and might be that this approach is better than the work that’s already been done. Will have a look and see about evaluating prospects of packaging it into something sleek for generating synthetic series.