Minutes epinowcast meeting: 2022-10-20
This meeting was led by @adrianlison and minutes were taken by @samabbott. Note: If anyone with edit access thinks anything was missed please adapt these minutes as you see fit.
See the minutes from last time here if missing context: Short-dated epinowcast meeting, 2022-09-30 - #6 by alison
Introductions for new community members
As usual we had a round of introductions for those attending their first meeting. Today these were @sangwoopark (read more about his work here: Sang Woo Park - intro) and @kathsherratt.
Organisational points
Recap from last time: no decisions in meetings. Rough consensus from discussion and then give people a chance to reply on the forum.
We decided that at the end of each meeting someone should volunteer to run the next one. This includes making an open post for the meeting on the day of the last meeting, making an agenda based on peoples suggestions, and running the meeting itself. The volunteer to take minutes will be found on the day.
We discussed how it would be useful to have an organisation account for hosting video calls and discussed the potential for small sources of funding to cover this (as well as the general forum costs which are currently out of pocket).
Finally, we discussed running a seminar series where presenters are invited to talk about real-time analysis issues, we lead workshops on potential use cases, and do deep dives on areas on the code/model to stimulate research and motivate contributions. The thought was this would be every month and be slightly more formal than our current meetings which would continue to be monthly (so we would overall have a meeting of some kind every two weeks)
@samabbott will open a thread on this and lead organising the first round of speakers.
Updates on known users
Various contributors have been contacted by users working at multiple public health agencies on a range of tasks. Most recently these have been people responding to novel outbreaks. Much of this work is not in the public domain but the synthesis of requirements is similar to that discussed here: Nowcasting in a real-time analysis pipeline to estimate the effective reproduction number with missing data and reporting delay
In terms of meeting these requirements once all proposed packages changes are merged in from pull requests and the development version is released we will only be missing non-parametric delays and the ability to forecast without custom code (though @Gunnar has a workaround for this discussed here: Forecasting in epinowcast - #7 by samabbott).
Another frequently raised point was issues with fully understanding all of the options given the time available to deploy the package. To help with this it was suggested that more feedback could be given - for example, if the number of days of data is much greater than the maximum delay (as this will take a long time to run and is unlikely to bring huge amounts of additional information depending on the task at hand) . Generally, we thought it was a good idea as long as it could be turned off from experienced users. One option that was floated was having a default verbose mode and an optional non-verbose mode.
A final common thread was the presence of reporting schedules (i.e by day of the week). This was similar to @teojcryan recent case study (Nowcasting COVID-19 cases by specimen date in England) which suggested we need to be able to easily specify constant priors for days with no reporting.
Epidemic phase bias and right truncation of delay distributions
@sangwoopark very kindly gave us an introduction into some of the issues estimating delay distributions in real-time.
The critical bias he flagged as an example was that when an epidemic is growing quickly we are more likely to see shorter serial intervals due to truncation. One of the questions he is looking at is whether we need to account for this using transmission dynamics, truncation adjustment, or both.
His current results suggest that either work, though dynamic adjustment requires modelling the underlying transmission process, they should not be used together.
Work on this is ongoing and there should be a paper giving more details in the near term (exciting).
Another useful example that was given is estimating the incubation period where left truncation can be an issue due to recruiting people based on if their onset has been observed. Similar methods can be used to correct for this and again the message was only to use one.
We discussed the relationship between this work and nowcasting. In particular, we highlighted how being able to estimate these delays is crucial when using latent reporting delays which we will shortly support in the development version of the package.
From there we moved on to discussing the role of censoring and especially censoring of onsets which are generally only known down to daily resolution and often not even with that granularity. @sangwoopark is also exploring this and interaction with truncation. I have also being playing around for methods to do this with the thought of providing tools for epinowcast
users (See here (WIP): https://gist.github.com/seabbs/027cd1c439e8acf1d598cc03ef33aaa4)
We flagged the fact that epinowcast
has all the tools to deal with this censoring in a robust fashion and so thinking about supporting this in the future would make sense.
Data package for nowcasting
Based on: Create a collection of benchmark data sets - #5 by nickreich
We discussed how useful it would be to have an easy way to find example data sets both for our own work and for other researchers. We also discussed that this could be a good second package for the organisation and be a nice entry point for contributors.
The general design we settled on was to have a remote store for the data (like Zenodo) and provide tools in R to download this data along with metadata. For this, to work we would need some kind of schema for how the data should be uploaded and the kind of metadata we would need alongside.
In the first instance, the suggestion was to focus on data sets that could be considered “traditional” for nowcasting and then to expand to include other kinds of data sources (for example to estimate delays).
The first actions we need to take are to lay out the design in the related thread, come up with a prototype name. From there we can set up a skeleton repository in the organisation and move development into the issues there.
@kathsherratt expressed some interest (no pressure!) in taking a leading role in some of this work.
How can we make the model easier to understand?
We discussed a range of ways we can make the model, and the vision for the future, easier to understand. Something that got some traction was writing up a mission + motivation paper so that others can understand our planned development path and how we want to get there (i.e as a community project). @FelixGuenther is going to make a post for further discussion on this.
The current roadmap is nearly documented in math in various PRs but this needs to be expanded on and cleaned up. We also discussed how useful a model schematic would be (@teojcryan schematic was mentioned here).
Some issues were also raised with the fact some parts of the current code base may be hard to understand. We concluded we could take a two pronged approach to this.
-
Use the wiki functionality of GitHub to map out the connections between the various models, pre and post processing
-
Highlight areas of the code that are currently hard to understand as GitHub issues and then flag these to contributors. Hopefully this will mean we can improve the commenting over time.
See here for post discussing mission statement: Mission statement/Model outline (manuscript/document)
Progress from last meeting
At the close of the meeting we briefly touched on progress from last meeting. The main issue that was raised is the regression in performance introduced with the switch to the renewal equation based approach ( https://github.com/epinowcast/epinowcast/pull/152 ). Both @adrianlison and @samabbott have looked at this and the conclusion is that it is likely bug free but that the log based convolutions may be just slower than natural scale ones. Hopefully @samabbott will have time in the coming week to test further. @Gunnar also flagged that he needs this functionality for his surveillance use case so ideally it would be ready sooner rather than later.
We also raised the lack of recovery of simulated missingness in the missing reference date model. Since the meeting @sbfnk has made some progress here by simulating data using the model and then fitting to it. This potentially suggests an identifiability problem with the current parameterization but it is still being looked into (Bug: Missing data proportion recovery · Issue #165 · epinowcast/epinowcast · GitHub)
Planning the next meeting
- The next meeting is at 1pm UK time on the 17th of November.
- @FelixGuenther is running the next meeting and will make an open agenda post shortly.
- @samabbott will make a permanent meeting link.