I thought I would cross-post this issue from @sambrand brand as I think it is pretty interesting to discuss in more detail. TLDR the idea is to extend our thinking on primarycensoring to approach generation interval estimation as a primary censoring problem.
opened 01:30PM - 08 May 25 UTC
enhancement
I think we should consider generation interval (GI) parametric estimation within… the `primarycensored` framework. This is clearly an important delay to make inference on, but requires careful handling.
## Concept outline
I'll give some first pass thoughts on where this fits into `primarycensored`.
### Generation intervals and observables
A standard model for secondary infections is that their number per primary infected is drawn from some offspring dist and those secondary infection times are drawn independently from the delay dist $f_G$, which is called the generation interval. For symptomatic infections, there is a further delay until onset of symptoms reaches some threshold which occurs after infection with a delay dist of $f_O$. For an infector-infectee pair we'll call their respective infection times $I_P$ and $I_S$
Typically, the infection times are unobserved, but if both primary and secondary infecteds are asserted due to developing symptoms and the infector-infectee relationship is established then we get (censored) _symptom onset_ times $t_P$ and $t_S$. If the target of interest for inference is the _serial interval_ (which can in principle be negative) then `primarycensored` can handle parametric inference here (albeit maybe with extension to negative "delays"). The serial interval has its own density function $f_S$.
### Serial interval censoring problem
We note that a realisation of the serial interval dist is:
```math
S = G + O_S - O_P
```
Where $O_S$ and $O_P$ are independent copies of $O\sim f_O$. Hence, the inferred _mean_ of the serial interval does match the _mean_ of the generation interval. However, the variance of $S$ will be higher than $G$; in general $f_S \neq f_G$.
Therefore, inference on $f_S$ doesn't answer all our possible modelling questions.
### Generation interval estimation as a censoring problem on onset times
#### Primary infection time censoring
In any plausible scenario the symptom onset distribution has a maximum value $O_{\text{max}}$. This allows us to bound the possible values of $I_P$ from an observation of the censored primary onset time $t_P$.
```math
I_P \in [t_P - O_{\text{max}}, t_P + w_P).
```
Where $w_P$ is the censoring window _on the primary onset time_ e.g $w_P= 1$ day.
We can consider this interval to be a censoring window _on the primary infection time_; that is that its an interval bound on a latent random variable. However, in this interpretation the density of infection time within the interval (conditional on being somewhere in the interval) is
```math
f_P(t) \propto \int_{t_P}^{t_P + w_P} \exp(r t) f_O(u - t) du,\qquad t \in [t_P - O_{\text{max}}, t_P + w_P).
```
Which is the usual density for an infection process growing at exponential rate $r$ reweighted by the observation that the primary onset arrives at some point in $[t_P, t_P + w_P)$. In a lot of situations this is a tractable prior for the infection time.
#### Secondary infection time censoring
_If_ we knew $I_P$ precisely within the interval $[t_P - O_{\text{max}}, t_P + w_P)$ then the delay between $I_P$ and $O_S$ is the sum $G + O_S$. Therefore, the likelihood of the secondary onset time occurring in its censor interval $O_S \in [t_S, t_S + w_S)$ is
```math
F_{G+O}(t_S + w_S - I_P) - F_{G+O}(t_S - I_P).
```
Where $F_{G+O} = F_G \ast f_O = F_O \ast f_G$ [as standard](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions). Correcting for not knowing $I_P$ precisely but rather having an primary censor interval with a conditional density is the main function of this package.
#### Comparison to standard usage of `primarycensored`
Comparing the standard usage of `primarycensored` to the above:
- **Standard**: $f_P$ represents density of an exponentially growing rate (at exp rate $r$) of primary event arrivals. **GI inference**: $f_P$ is reweight by a partial convolution with $f_O$.
- **Standard**: Delay distribution $f_T$ is the direct target for inference. **GI inference**: Delay distribution $f_T = f_G \ast f_O$ where $f_G$ is the main target for inference.
### Implementation
The outline above I think shows where we can put GI inference into our current framework. The actual implementation details are not as clear to me but hinge on how conveniently we can represent the _implicit_ distributions $f_P$ and $f_T$ in our code.
1 Like
It would be very nice if we/someone can get this going. Conditionally, on the GI stuff working nicely (and I think it will) then that opens the door to joint inference on GI and incubation delay distributions which would be really useful work for analytic PH I think.
1 Like
Yes I agree.
I think implementation wise I see a fairly clear path and could do it with spare capacity. Write up and eval is a whole other thing though.
I am also keen for people to pick holes here or suggest other elements. Something I like about this presentation is it makes linking to another model quite easy (via the growth rate prior) so having this in Julia where we are thinking about model composability makes a lot of sense to me.
1 Like