Documentation: CensoredDistributions.epiaware.org
Warning: I wrote this leaning heavily on my LLM pal, hence it being so much more up beat than my normal style!
We’re excited to share CensoredDistributions.jl, born from the 2024 Epistorm Rt collaborathon in Boston where most of our team gathered to tackle real-time epidemic parameter estimation. The package mirrors the R package primarycensored
but brings Julia’s ecosystem to the world of censored distributions.
It tackles the messy reality of outbreak data where delay distributions are comprised of events only known to occur within time windows - both the primary event (like exposure) and secondary observations (like daily reporting) introduce censoring that needs to be handled properly. It’s also interesting to compare implementations - Julia’s multiple dispatch turns out to be very nice for this problem, letting us cleanly handle different distribution types and censoring scenarios.
Try It Now
Want to see it in action? Here’s a few lines that sets up a temporary environment and runs a demo with plots:
using Pkg
Pkg.activate(temp=true)
Pkg.add(["CensoredDistributions", "Distributions", "UnicodePlots"])
include(download("https://gist.github.com/seabbs/3cbd4e5bcdba30deec3081ba3838d556/raw/censoreddistributions-demo-gist.jl"))
Why Event Censored Distributions Matter
If you’re working with outbreak data, you’ll recognise this problem. Someone gets infected, develops symptoms days later, gets tested a few days after that, and the results take another few days to process. By the time you see the data, you’re looking at a chain of delays where each step introduces uncertainty about timing.
The Charniga et al. (2024) paper demonstrates just how important it is to handle this censoring and potential truncation properly. Get it wrong and your epidemic parameters will be biased, which matters when you’re trying to understand transmission dynamics or forecast case trends.
What is it?
CensoredDistributions.jl implements three main types of censoring:
Primary event censoring: When the initial event (like infection) happens within a time window, but you don’t know exactly when.
Interval censoring: When continuous events get binned into discrete time intervals (think daily case reports).
Double interval censoring: Combines both - because epidemiological delay distribution data often has both types of censoring.
The package integrates with Distributions.jl and Turing.jl. See Fitting with Turing.jl · CensoredDistributions.jl for a pretty fun little tutorial. Loving DataFramesMeta
.
What’s Missing (And What’s Coming)
The big missing piece is exponentially tilted priors (issue #62). These let you model primary events that happen at exponentially changing rates during epidemics. Without them, you’re assuming infections are equally likely throughout a censoring window, which isn’t realistic during exponential growth or decay.
We’re also planning to cover the same functionality as the epidist
R package, including partially pooled delay distribution fitting. There’s an open issue for an Ebola outbreak analysis vignette that will demonstrate this. Due to the lack of a brms
equivalent in Julia, this will have to be a bit more manual and clunky. If anyone is interested on working on a brms
equivalent, we’d love it to happen and I (Sam) would be happy to help.
The plan is to then write a Turing extension package with models for each of the key distribution helpers we support and submodels for the underlying distributions. See the future plans for where this might be going (fun times).
Autodiff is also proving tricky. We’re finding that some distributions don’t play nicely with all the different autodiff packages (ForwardDiff, Zygote, Enzyme, etc.), though we’re still learning the ins and outs of this ecosystem. We’re adding tests using DynamicPPL’s run_ad
function, but it would be nice to have more standardised tooling for this - maybe badges showing which autodiff backends work for a given package.
Turing vs Stan
We’ve been comparing Turing.jl and Stan for this work. Turing’s generative approach means you write models that look like data-generating processes, and it produces cleaner models with less code. The error messages can be cryptic, though, and this seems connected to the evolving autodiff ecosystem mentioned above. Importantly, though, Turing.jl models have been a joy to write and work with. Something like tidybayes
would be a great addition to the ecosystem (there are some similar tools, but not covering the entire tool stack).
There are lots of big pluses though. The crazy lengths we have to go to in order to share stan code ( How to use primarycensored with Stan • primarycensored ) or even just doc it ( primarycensored: Primary Censored Stan Functions ) is not something I miss.
The key place to start to get a sense of the differences are:
Turing: https://censoreddistributions.epiaware.org/dev/getting-started/tutorials/fitting-with-turing/
Stan: https://primarycensored.epinowcast.org/articles/fitting-dists-with-stan.html
Then spiral out from their based on the functions used etc.
The Weight Utility
The weight
utility in this package really has nothing to do with censored distributions, but we needed weighted likelihoods. Using @addlogprob
directly is clunky and breaks the generative model pattern. Our weight
wrapper lets you stay generative while handling weighted data, though it’s a bit of a hack since you need both counts and data points. The nice thing is you can still use the same distribution for both sampling and likelihood calculation. The big downsides are that it doesn’t support conditioning which is sad (via DynamicPPL.condition
), as I (Sam) have no idea how to do this. It also doesn’t support anything that isn’t an Distribution.jl
object, i.e submodels. Maybe this could become a proper Turing.jl utility at some point as one of the package’s main benefits is being generative, so letting users keep doing that seems great.
Future Plans
This is a prototype for learning about Julia development and exploring ideas for composable Julia modelling. Once we have the Ebola tutorial/example working, the next big task is adding a Turing extension with reusable modules and making it composable with different distribution submodels. See EpiAware’s Rt-without-renewal approach for this kind of thinking. Ideas very welcome.
The goal is to build towards modular, composable epidemic models where you can mix and match different components - delay distributions, transmission models, observation processes - without having to understand every piece of a monolithic model.
Get Involved
Contributions are very welcome. This project is as much about learning good Julia development practices as it is about epidemiology. Check out the documentation, browse the GitHub issues, or join the discussion.
The package is already useful for basic censored distribution work, and we’re building towards something much more comprehensive. If you’re working with epidemiological delays or just interested in composable statistical modelling in Julia, we’d love to hear from you.