Effect of zero padding in epidist

kgostic · 13 March 2024 20:07

Hi All,

I noticed that almost all the models in epidist use a zero-padding approach, where 0-length delays are changed to some small non-zero number, 1e-3 by default.

I find this a little sketchy, but I recognize that it’s necessary because brms throws ''Error: Family ‘gamma’ requires response greater than 0" if you pass a 0 into a gamma fit, and I assume similar issues arise for lognormal and Weibull. I see that this is a know issue in brms and one that might be hard to get resolved: Better handling of interval censoring at distributional bounds · Issue #1070 · paul-buerkner/brms · GitHub

I’m wondering if anyone has tested the effect of this zero-padding approach on the estimated distribution? I’m specifically interested in the censoring adjusted delay model right now and was planning to run a test to convince myself, e.g. that I get a similar answer when I pad with 1e-3 and 1e-6.

I think padding is unlikely to have a huge impact in models that account for interval censoring, as it’s just slightly shifting the lower bound of integration. But I think the effect might be greater in other/more naive models where it shifts the actual value of the response variable form 0 to something else.

Overall, I’d be grateful to hear if anyone else has specifically experimented with this and can confirm that zero-padding in general is safe (i.e. that the fitted model is insensitive to this approach).

kgostic · 13 March 2024 20:09

tagging @samabbott and @sangwoopark who will know what I’m talking about

sangwoopark · 14 March 2024 03:58

Zero-padding is definitely problematic if you’re not accounting for censoring. For example, if you’re fitting a lognormal distribution, whether you pad with 1e-3 or 1e-6 is going to have quite different results.

I think we tried experimenting and it doesn’t really matter if we’re accounting for censoring? At least that’s consistent with my intuition too. If you have a ton of zero that needs to be padded, padding could be problematic, but then you might want to consider other approaches in that case (e.g., zero-inflated discrete distribution).

Not sure when you’re planning to run the test but I’m also happy to run some tests/experiments over the weekend too. It might be useful to add a vignette about it anyway.

sangwoopark · 18 March 2024 15:53

Had to spend the weekend working on my thesis… getting to this today!

kgostic · 18 March 2024 16:12

No worries! I wasn’t trying to make work for you. Just curious. We can do our own tire-kicking for now!

sangwoopark · 18 March 2024 17:50

I think it’s good practice to document things in the package anyway (and I need to spend more time on the package so this is a good push for me…). Here’s a crude comparison using interval-reduced censoring (so there’s some bias there already) but just showing that results seem insensitive to padding (and it turns out that dropping zeroes is better than padding in the naive case, which also makes sense).

Code here: epidist/vignettes/zeropad.Rmd at 653645e9cf4bbcd1b6d2c5bad1e640651ab5a2ec · parksw3/epidist · GitHub

samabbott · 4 April 2024 17:43

Yes, I think this is the correct intuition.

think it’s good practice to document things in the package anyway (and I need to spend more time on the package so this is a good push for me…)

Agree.

Topic		Replies	Views
Epidist 0.2.0 and primarycensored 1.1.0 Announcements package-extension	0	21	12 February 2025
Primarycensoreddist: Primary Event Censored Distributions in R and Stan package-extension	3	102	5 November 2024
Adding a new package to epinowcast github Project Proposals	2	449	27 February 2023
Generation interval estimation as a censoring problem Project Proposals	2	42	9 May 2025
Maximum delay treatment Developers design-choices	4	285	23 August 2022

Effect of zero padding in epidist

Related topics