Modularity via a Stan model generator

This is a proposal for a large scale refactor that is not part of the current MVP development timeline. It will need substantially more discussion and potentially significantly more resources than we currently have. I’ll update this initial post with more detail in the near term.

The current Stan model implementation is built in a similar way to rstanarm in that it is monolithic and all components are integrated but optional. This is nice as it gives a single model to compile and makes it easier for us to avoid duplicating code. However, it makes it much harder for contributors to add components without a very relatively good understanding of the majority of the model components, this is particularly true in the likelihood. It also makes it harder (but not impossible) to reuse model elements elsewhere. We have tried to deal with this issue by making the code very functional but clearly more needs to done.

Another design option is to use a Stan generating approach (with this being generalisable to other backends). An example of this is brms. This could be a complete custom implementation or it could make use of brms functionality (which could replace much or all of our current formula code).

Making use of brms is ideal as so much great work has already been done. However there are several issues with this. One is that it makes it harder to change backends though this is of course a stretch goal. Another is that it is difficult. I have tried historically two approaches:

  • Using the custom code interface in brms
  • Generating Stan code with brms and then modifying it

Both of these turned out to be quite hard. The first gave a restricted subset of models and was quite hard to understand (maybe again restricting contributors and reuse). It needed significantly more effort than I had available to me to make it work. It would also likely not be possible to use as much optimisation as we currently have due to package restrictions. The second worked well for individual models but was quite brittle and so hard to generalise into a tool.

Going for a custom Stan generator is perhaps the easiest option. I have a partial prototype for this but it is again non-trivial to make it work well and to be easy to use. In many ways it is a domain independent project so ideally it would be dealt with by others in the Stan community with more resources but I haven’t seen much happening in this direction.

An alternative to this project is changing completely to a different backend as the primary modelling option. Other potential tooling, mostly in python and Julia, have much richer programming tools available to them and so could be a better option for an easy to use modular library of infectious disease model components.

A smaller scale project but one that would still have much value is integrating tools from brms or similar packages to manage elements of our formula interface and potential Stan code. This would make a future switch to a model generator easier and potentially help us add useful features such as Gaussian processes which otherwise we will have to implement ourselves and this is non-trivial.

I’m very interested to hear what people think about this, why it doesn’t exist elsewhere and potential options for making it happen.