CensoredDistributions.jl: Julia Meets Messy Outbreak Delay Distribution Data

Documentation: CensoredDistributions.epiaware.org

Warning: I wrote this leaning heavily on my LLM pal, hence it being so much more up beat than my normal style!

We’re excited to share CensoredDistributions.jl, born from the 2024 Epistorm Rt collaborathon in Boston where most of our team gathered to tackle real-time epidemic parameter estimation. The package mirrors the R package primarycensored but brings Julia’s ecosystem to the world of censored distributions.

It tackles the messy reality of outbreak data where delay distributions are comprised of events only known to occur within time windows - both the primary event (like exposure) and secondary observations (like daily reporting) introduce censoring that needs to be handled properly. It’s also interesting to compare implementations - Julia’s multiple dispatch turns out to be very nice for this problem, letting us cleanly handle different distribution types and censoring scenarios.

Try It Now

Want to see it in action? Here’s a few lines that sets up a temporary environment and runs a demo with plots:

using Pkg
Pkg.activate(temp=true)
Pkg.add(["CensoredDistributions", "Distributions", "UnicodePlots"])
include(download("https://gist.github.com/seabbs/3cbd4e5bcdba30deec3081ba3838d556/raw/censoreddistributions-demo-gist.jl"))

Why Event Censored Distributions Matter

If you’re working with outbreak data, you’ll recognise this problem. Someone gets infected, develops symptoms days later, gets tested a few days after that, and the results take another few days to process. By the time you see the data, you’re looking at a chain of delays where each step introduces uncertainty about timing.

The Charniga et al. (2024) paper demonstrates just how important it is to handle this censoring and potential truncation properly. Get it wrong and your epidemic parameters will be biased, which matters when you’re trying to understand transmission dynamics or forecast case trends.

What is it?

CensoredDistributions.jl implements three main types of censoring:

Primary event censoring: When the initial event (like infection) happens within a time window, but you don’t know exactly when.

Interval censoring: When continuous events get binned into discrete time intervals (think daily case reports).

Double interval censoring: Combines both - because epidemiological delay distribution data often has both types of censoring.

The package integrates with Distributions.jl and Turing.jl. See Fitting with Turing.jl · CensoredDistributions.jl for a pretty fun little tutorial. Loving DataFramesMeta.

What’s Missing (And What’s Coming)

The big missing piece is exponentially tilted priors (issue #62). These let you model primary events that happen at exponentially changing rates during epidemics. Without them, you’re assuming infections are equally likely throughout a censoring window, which isn’t realistic during exponential growth or decay.

We’re also planning to cover the same functionality as the epidist R package, including partially pooled delay distribution fitting. There’s an open issue for an Ebola outbreak analysis vignette that will demonstrate this. Due to the lack of a brms equivalent in Julia, this will have to be a bit more manual and clunky. If anyone is interested on working on a brms equivalent, we’d love it to happen and I (Sam) would be happy to help.

The plan is to then write a Turing extension package with models for each of the key distribution helpers we support and submodels for the underlying distributions. See the future plans for where this might be going (fun times).

Autodiff is also proving tricky. We’re finding that some distributions don’t play nicely with all the different autodiff packages (ForwardDiff, Zygote, Enzyme, etc.), though we’re still learning the ins and outs of this ecosystem. We’re adding tests using DynamicPPL’s run_ad function, but it would be nice to have more standardised tooling for this - maybe badges showing which autodiff backends work for a given package.

Turing vs Stan

We’ve been comparing Turing.jl and Stan for this work. Turing’s generative approach means you write models that look like data-generating processes, and it produces cleaner models with less code. The error messages can be cryptic, though, and this seems connected to the evolving autodiff ecosystem mentioned above. Importantly, though, Turing.jl models have been a joy to write and work with. Something like tidybayes would be a great addition to the ecosystem (there are some similar tools, but not covering the entire tool stack).

There are lots of big pluses though. The crazy lengths we have to go to in order to share stan code ( How to use primarycensored with Stan • primarycensored ) or even just doc it ( primarycensored: Primary Censored Stan Functions ) is not something I miss.

The key place to start to get a sense of the differences are:

Turing: https://censoreddistributions.epiaware.org/dev/getting-started/tutorials/fitting-with-turing/
Stan: https://primarycensored.epinowcast.org/articles/fitting-dists-with-stan.html

Then spiral out from their based on the functions used etc.

The Weight Utility

The weight utility in this package really has nothing to do with censored distributions, but we needed weighted likelihoods. Using @addlogprob directly is clunky and breaks the generative model pattern. Our weight wrapper lets you stay generative while handling weighted data, though it’s a bit of a hack since you need both counts and data points. The nice thing is you can still use the same distribution for both sampling and likelihood calculation. The big downsides are that it doesn’t support conditioning which is sad (via DynamicPPL.condition), as I (Sam) have no idea how to do this. It also doesn’t support anything that isn’t an Distribution.jl object, i.e submodels. Maybe this could become a proper Turing.jl utility at some point as one of the package’s main benefits is being generative, so letting users keep doing that seems great.

Future Plans

This is a prototype for learning about Julia development and exploring ideas for composable Julia modelling. Once we have the Ebola tutorial/example working, the next big task is adding a Turing extension with reusable modules and making it composable with different distribution submodels. See EpiAware’s Rt-without-renewal approach for this kind of thinking. Ideas very welcome.

The goal is to build towards modular, composable epidemic models where you can mix and match different components - delay distributions, transmission models, observation processes - without having to understand every piece of a monolithic model.

Get Involved

Contributions are very welcome. This project is as much about learning good Julia development practices as it is about epidemiology. Check out the documentation, browse the GitHub issues, or join the discussion.

The package is already useful for basic censored distribution work, and we’re building towards something much more comprehensive. If you’re working with epidemiological delays or just interested in composable statistical modelling in Julia, we’d love to hear from you.

3 Likes

@sambrand @medewitt and everyone else from the meetup we should probablt have a chat about this soon. Anyone else very welcome (react on this and I will invite).

1 Like

This is very much me hijacking the conversation so apologies, but I’m keen to stay on top of these sorts of developments, but haven’t tried Julia in like 8 years.

Are there any preferred resources for learning Julia you’d recommend particularly good for stats/epi/IDD that would help someone get up to speed to be involved in this research?

1 Like

Been inundated with GitHub emails watching you and Claude work :slight_smile: . Happy to talk more….I have some more use case data I want to run through this. Plus happy to reengage on this project!

This book is great: Julia for Data Analysis - Bogumił Kamiński for “modern” tidyesque approaches to Julia. You can click through the chapters and it appears to launch a window to see all of the content.

The Turing docs are really good along with the SciML docs. Plus Simon Frost has most of the compartment models implemented here. MixedModels is the spiritual successor to lme4.

Curious to hear if others have other resources!

2 Likes

So there are lots of bits and bobs out there but one of my complaints is that its a bit of a mess and Julia has quite a few rough edges they have rounded out with packages etc.

I was treating these docs as somewhat of a Julia entry point (with the eventual plan to move some of them to an org level once more packages come along) so I have a few rough pieces of advice here: Julia · CensoredDistributions.jl (i.e. to combine with working through the tutorials etc).

I need to take another pass at this as I think there are a few more good resources out there I have missed. I would also like to do a version of our NFIDD/SISMID course in julia would be natural place for this stuff + I need to run a short course so LSHTM might think about promoting me and I would like to do something around using Julia for IDE so watch this space (not much progress on that so far). Related to this I have autumn plans of writing a why Julia for IDE (hit me up anyone who is interested in contributing).

Sorry not very conclusive as an answer but I think it should get a bit better in the next 6 months

2 Likes

Ah @medewitt good to have rec on that book I wans’t sure if it was worth it. I might check it out. I must say the sparsity of the DataFramesMeta syntax really warms my knarled data.table heart.

To highjack the highjack @medewitt is MixedModels able to integrate into Turing they have their TuringGLM but its quite first pass and something Julia is really missing is a brms like thing. With all the composability a tool like that could be really incredible.

Haha yes sorry got a bit obsessed (shock).

1 Like

Oh and @jonathon.mellor Pluto (the thing we write the tutorials in) is a really nice forgiving place to start playing around as it handles a lot of the edges for you and looks nice at the same time.

1 Like

Oh and I was also thinking of some kind of virtual IDE julia meetup thing we could do once I have had August to recover from talking to too many people during teaching

1 Like

New release adding support for exponentially tilted distributions along with a short tutorial into what that means and why it matters: Release v0.2.0 · EpiAware/CensoredDistributions.jl · GitHub

New release adding support for Turing condition syntax. Updated the tutorial so it does full simulation from the same model that is then fit using fix and condition helpers. Pretty wizzy.

You can see it in use here: Fitting with Turing.jl · CensoredDistributions.jl