Lots of GP models + nowcasting

Automatic kernel selection with GPs

I’m a really big fan of this package GitHub - probsys/AutoGP.jl: Automated Bayesian model discovery for time series data which does automated Gaussian process kernel selection in a particularly clever way.

Way back in the day I was interested in particle filters/SMC methods and their interface to parameter inference (i.e. pMCMC etc). AutoGP does a really clever multi-stage inference powered by the Gen.jl PPL:

  • Outer layer is an ensemble of particles. Each particle represents a GP model with a particular structure and parameters. As you pass this more data you reweight and resample the ensemble (standard particle filter technique.)
  • In-between new data ingestion you can propose new kernal structures according to a really neat walk on a graph structure of kernels. Its not as simple as proposing going from say SE → Linear, you can make those proposal but also combination moves of kernel * new_kernel, kernel + new_kernel and CP(kernel, new_kernel) which represents proposing a change point in the validity of one auto-covariance structure in favour of another. These get proposed using a specialise MH step which wraps HMC steps for the continuous parameters in the proposed discrete structures.

AutoGP has a really nice API to do the above on a big chunk of data or in small sequential chunks.

Nowcasting

The big problem for an epi application point of view is that it doesn’t have a “natural” way to include nowcast modelling. AutoGP is heavily specialised into being the best it can be in the restricted domain of pure time series modelling i.e. no covariates and no multidimenisional inputs like x = (reference_date, report_date).

To get around this I’ve developed NowcastAutoGP with my Center for Forecasting and Outbreak Analytics hat on (ok I actually only have one hat) GitHub - CDCgov/NowcastAutoGP: Combining AutoGP (Gaussian process ensembles with kernel structure discovery) with data revisions .

The idea is pretty simple; NowcastAutoGP ingests nowcast samples from any nowcast model that can generate nowcast samples, and uses the AutoGP sequential data ingestion API to batch forecasts over the set of sampled nowcasts.

The upside is that this is very flexible, you can choose your favourite nowcasting model to generate nowcasts and pipe into AutoGP’s handy forecast tooling. The downside is that because its not a joint nowcast/forecast model the posterior distribution of the nowcasts is not influenced by the likely trajectory of the GP models… which opens you up to misspecification in extreme examples.

EDIT:

Since @samabbott wanted a bit more context. The basic idea of SMC is to evolve a distribution towards a target distribution, this can take many forms but the most common are things like Kalman filters (the distribution is known to be invariantly Normal therefore you only need to update mean vectors and covariance matrices) and particle filters (which are the same idea as samples in MCMC but you increment all the particles rather than sampling a chain of them).

The upside of SMC is that a common way of incrementing them is to observe new data, this corresponds to the vibey term of “updating your priors”. So in the case where your nowcast data is short compared to your longer time series of stable reports this is very convenient think:

  1. Learn everything about the stable reporting past you can.
  2. Cache that
  3. Batch a set of new possible learnings over your nowcasting ensemble that increment your cached learning
  4. In each batch do a forecast.

This corresponds to usual posterior predictive modelling.

In an MCMC approach you could do this, but it would be very computationally painful depending on your model since 2000 nowcasts would imply running the MCMC over 2000 effective datasets (including all the stable past data points). Using MCMC you’d be much better working harder to create a proper joint model of latent process, eventual reports and current reports… but that is such a hard challenge you’d need an advanced code base, a community, maybe some seminars and a forum to discuss such a difficult modelling challenge :wink: .

Significant downsides to the NowcastAutoGP approach

Unfortunately, the convenience above comes with plenty of costs:

  • The nowcasts arrive as part of a “pipeline” analysis, which is convenient but isn’t a joint model of process and reporting. I can easily imagine cases where this goes wrong.
  • This is really fast when you only do the “outer loop” of particle reweighting, and gets successively slower as you add more particle refresh steps.
  • It relies on Gen and I’m having a few esoteric issues with deepcopy over various model object.

Full SMC?

The above kind of suggests that some bright spark should do a full model with SMC inference rather than this kind of fast glue project… back to you @samabbott

1 Like

Will respond in more detail but for interested readers might be good to edit to make it clear why SMC is a nice thing to have when you want to batch forecasts over a bunch of nowcasts (what an idea - someone deserves a raise).

i.e. this bit might not be immediately clear to people!

I’m also really excited about some of the potential for these kind of streaming kernel composition problems. There is a very clear piece of work extending this to handle multiple dimensions of data which would open up a lot of potenial and another more epi related thinking about things like the renewal process and convolutions as GP kernels.

(what an idea - someone deserves a raise).

Hahahaha… this did indeed arise from me mentioning I’d do importance sampling over forecasts relative to a LogNormal over recent week reporting multipliers and then Sam says why not batch over them over nowcasts… and there you go.

1 Like

There is a very clear piece of work extending this to handle multiple dimensions of data

Yes, imo the key step here is some kind of clever proposal mechanism to avoid getting lost in “kernel space” as number of dims gets larger. This was the underlying reason that gradient based proposals came to dominate MCMC samplers; but that is on a differentiable problem rather than a structural problem.

1 Like

Thank you this is a great explanation

TLDR: SMC lets you model the majority of the data once and then only forecast repeatedly on the more recent uncertain data. This is a big win.

Are you aware of any useful work on this?

Are you aware of any useful work on this?

Yes, sircovid (iirc) from imperial uses SMC ideas under the hood e.g. see this from their underlying dep mcmstate Restarting pMCMC • mcstate

I found this package/paper interesting back in the day GitHub - geirstorvik/smc.covid

There was/is a Julia project for doing SMC R_t estimation but I’m having super brain fade.

This 2012 paper about using a variant of ensemble Kalman Filter to forecast flu was interesting (IIRC was my original entrypoint to this) Preparing to download ...

1 Like

I think the popularity of doing MCMC is that:

  1. If you can define a model in stan you can use NUTS and that is not only a great sampler its also a direct path to your inference goal.
  2. Related to 1., You really do need to make active inference decisions to use SMC (in a lot of cases). The beauty of AutoGP is that it has strong opinions about inference (nest HMC inside MCMC inside SMC) which work very well in a restricted domain, but in other domains might not.

Although having said that, I suspect that if we could do a big survey of Bayesian methods over all predictive modelling (e.g. weather, defence applications etc etc) then the Kalman filter might be the most popular.

1 Like

That is PMCMC thought right not SMC squared and so you can’t do a lot of the nice real time things.

you can define a model in stan you can use NUTS

I think its also the model flexibility i.e. here precisely what you can do is very locked down.

1 Like

Yeah, but its illustrative of how many options are available (which I think can be a bit daunting / hard to write easy pkg).

In the sircovid context the SMC is inside the MCMC routine to provide marginal likelihood (I think, its been a while!), whereas in AutoGP SMC is the outer loop and MCMC/HMC provide a way to propose new particle states.

Without getting my deep thinking cap on, anything with an outer loop of SMC has the nice real time properties that would be interesting here.

1 Like

Yes this is what they do and people most commonly do.