Renewal models as neural networks

My first Msc was in neural networks back in the mists of time (2006), so I’ve always had an interest whilst not really going down a ML maxing path in my research.

An aspect of epi Renewal models is that the underlying infection generating process

I(t) = R(t) \sum_{s\geq 1} g(s) I(t-s)

can be thought of as a sequence of two step operations:

  • A weighted sum over a “hidden” or carried state of recent infections
  • Multiplication by the $t$th “input” value of R(t)

The generation of this process for T time steps from an initial hidden state and some input vector R(t) involves a T-fold recursion. This is precisely the kind of modelling implemented by a recurrence layer in a neural network Recurrent neural network - Wikipedia

Going from latent infections to the expected number of observations in a simple model is just a one dim convolution

O(t) = \left( I \circ f \right)(t)

Convolution layers are also standard in NN libraries.

Neural networks are designed to be the composition of input-output layers (functional composition) hence the simple renewal model is just a chain of commonly used NN layers with a Poisson link at the end.

The main potential wins here are (IMO):

  • Neural networks are fun and have plenty of libraries, and naturally promote compositional modelling in terms of function compositions (i.e. layers)
  • NNs are popular and have lots of compute tricks that people have spent a lot of time on (e.g. efficient adjoints, GPU kernels, optimised compute/AD libraries like Reactant.jl etc etc)

I have some real scrappy code I did quick by shouting at claude, that I’m not yet going to put on main of even a “try things out” repo. This fuses the ideas I’ve outlined above using the NN library Lux.jl (my current fav NNlib in Julia)

1 Like

This looks great Sam and I agree there is a lot of potential here. As we threw back and forth in the composability work Lux.jl and other NN packages also other a ready made solution to composabiltiy and so are attractive in terms of that.

Something else to note is that it is quite common to represent AR processes as NN where you have time varying parameters on the AR. As the renewal is a constrained AR that also suggests this is a natural thing to do (@OvertonC and I were chat about this earlier today). You could also make it so that Rt is modelled as time-varying AR with the renewal as an AR and generation time weights also time evolving.

I think you should be able to represent ascertainment using this approach fairly easily as well which is pretty neat.

I agree on the potential benefits etc. It would be really interesting to explore this more. There is almost a ready made study in this I think implementing the case studies from New pre-pre-print: Composable probabilistic models can lower barriers to rigorous infectious disease modelling - #6 by samabbott in a NN (not the ODE one perhaps).

Something we have talked about before is how the accumulation approach we used in that work is literally what you need to do here (i.e. look at Sam’s NN code and then look at: Rt-without-renewal/EpiAware/src/EpiLatentModels/models/AR.jl at 69e1918c681985a7656779d991b82e7c8e96962a · seabbs/Rt-without-renewal · GitHub

I’m not sure how much capacity you have but I certainly don’t have a lot. I would be very keen to support a project exploring this in more detail.

Yes, this is neat, its then just adding a layer before the “solve renewal” layer.

1 Like

There is a connection here to mechanistic NNs ( Mechanistic Neural Networks for Scientific Machine Learning ) though I think those are mostly about embedding some kind of ODE system. Through that there is a connection to UDEs ( Universal differential equations in system Biology - #3 by samabbott ) as well of course.

1 Like

Nice links. I’d need to more than glance to be sure, but at a glance, putting the renewal equation solve in is like a simple special case of the mechanistic layer in the first linked paper.

The way I’m thinking about this is:

  • One route is to make the dynamics “NN-ified”, thats this toy model and the first link. The upsides are unified familiar compute frameworks plus potential under the hood optimizations.
  • Another route is to take a NN and “dynamic-ify” it. The upsides are the huge amounts of adaptive solver work that have gone into ODE solvers and their adjoint systems. This was the original motivation of Neural Ordinary Differential Equations i.e. why have a recurrent layer and try and tune the number of recurrence steps when you can outsource this to a adaptive solver.
1 Like