Effect of zero padding in epidist

Hi All,

I noticed that almost all the models in epidist use a zero-padding approach, where 0-length delays are changed to some small non-zero number, 1e-3 by default.

I find this a little sketchy, but I recognize that it’s necessary because brms throws ''Error: Family ‘gamma’ requires response greater than 0" if you pass a 0 into a gamma fit, and I assume similar issues arise for lognormal and Weibull. I see that this is a know issue in brms and one that might be hard to get resolved: Better handling of interval censoring at distributional bounds · Issue #1070 · paul-buerkner/brms · GitHub

I’m wondering if anyone has tested the effect of this zero-padding approach on the estimated distribution? I’m specifically interested in the censoring adjusted delay model right now and was planning to run a test to convince myself, e.g. that I get a similar answer when I pad with 1e-3 and 1e-6.

I think padding is unlikely to have a huge impact in models that account for interval censoring, as it’s just slightly shifting the lower bound of integration. But I think the effect might be greater in other/more naive models where it shifts the actual value of the response variable form 0 to something else.

Overall, I’d be grateful to hear if anyone else has specifically experimented with this and can confirm that zero-padding in general is safe (i.e. that the fitted model is insensitive to this approach).

tagging @samabbott and @sangwoopark who will know what I’m talking about :slight_smile:

Zero-padding is definitely problematic if you’re not accounting for censoring. For example, if you’re fitting a lognormal distribution, whether you pad with 1e-3 or 1e-6 is going to have quite different results.

I think we tried experimenting and it doesn’t really matter if we’re accounting for censoring? At least that’s consistent with my intuition too. If you have a ton of zero that needs to be padded, padding could be problematic, but then you might want to consider other approaches in that case (e.g., zero-inflated discrete distribution).

Not sure when you’re planning to run the test but I’m also happy to run some tests/experiments over the weekend too. It might be useful to add a vignette about it anyway.

2 Likes

Had to spend the weekend working on my thesis… getting to this today!

No worries! I wasn’t trying to make work for you. Just curious. We can do our own tire-kicking for now!

I think it’s good practice to document things in the package anyway (and I need to spend more time on the package so this is a good push for me…). Here’s a crude comparison using interval-reduced censoring (so there’s some bias there already) but just showing that results seem insensitive to padding (and it turns out that dropping zeroes is better than padding in the naive case, which also makes sense).

Code here: epidist/vignettes/zeropad.Rmd at 653645e9cf4bbcd1b6d2c5bad1e640651ab5a2ec · parksw3/epidist · GitHub

1 Like

Yes, I think this is the correct intuition.

think it’s good practice to document things in the package anyway (and I need to spend more time on the package so this is a good push for me…)

Agree.