Identifying Changing Relationship in Primary and Secondary outcomes

Project that I’ve been malingering on for a bit, per the subject line:

Epidemics / pandemics / routine surveillance have primary outcomes (e.g. cases) and secondary outcomes (e.g. hospitalizations, deaths, …). Generally, the primary one is the indicator of transmission, but the secondary one is public health outcome we care more about it.

At least during COVID-19, cases were a leading indicator for trends in deaths/hospitalizations. But notably at the outset of the omicron wave (though also at other stages), the relative relationship between cases and deaths (i.e. the case-fatality ratio) changed markedly. However, that fact both wasn’t immediately clear, and was a subject of great interest for decision-making (e.g. how intensely should we react? what solutions, like waiving healthcare worker test+ requirements, can we entertain?).

Based on some preliminary work, seems like we could use the epinowcast-style tools to identify when the relationship between primary and secondary outcomes is changing. I’m being deliberately precise here about what we’re trying to do: what is wanted here is a clear indicator that the confluence of many factors (detection of primary, second outcomes; fraction of primary outcomes leading to secondary; time delays associated with everything) has changed. I think something like “the CFR is going up/down” probably can’t come out just the two time series. But scoring the relative predictive capability of history vs just now seems doable.

1 Like

So in essence the aim is to estimate the relationship between two indicators where this may be both a delay and a scaling in the presence of right truncation for both indicators? This has definitely come up a few times in my work and all the methods I have seen are generally designed for idealised data or really just very simplistic (i.e. taking external delay estimates and convolving between two time-series is a common “solution” to this issue but I think one that isn’t really enough for robust real-world usage or the more complex version where you also estimate the delay (Estimate a Secondary Observation from a Primary Observation — estimate_secondary • EpiNow2 & GitHub - epiforecasts/idbrms: Population-level infectious disease modelling as an extension of brms. are examples)).

I think we could do something like this with the epinowcast framework, and get a lot of benefits from our existing machinery, if we added a linkage between strata. In the simple case that could look like,

S_t = \sum^D_{d=0}\alpha_{t-d} f_{t-d}(d) P_{t-d}

Where P_t is the primary indicator, S_t is the secondary indicator, \alpha is some scaling factor (i.e the CFR in the case of cases and deaths, and f(\tau) is some probability mass function.

Then both could be right truncation adjusted/nowcasted within the framework of the package.

At the moment we use a renewal equation (or some simplification thereof) and latent delays as our generative process. That looks something like this (see Model definition and implementation • epinowcast for more detail here) where \lambda_{g,t} would be both S_t and P_t as defined above (i.e they would be separate strata).

\lambda^l_{g,t} = R_{g,t} \sum_{p = 1}^{P} G_{g}\left(p, t - p \right) \lambda^l_{g, t-p}
\lambda_{g,t} = \nu_{g,t} \sum_{\tau = 0}^{L - 1} F_{g}\left(\tau + 1, t - \tau \right) \lambda^l_{g, t - \tau}

\nu_{g,t} is already acting as a scaling here (though we potentially want to rethink this so that it impacts the hazard of F as suggested by @adrianlison) for individual variables (i.e as some kind of CFR etc. depending on the setting and R_t can be specified jointly. A potential option would be to try and model the relationship between variables via the generation time (as this is just indicating a relationship between current and past \lambda). This is something we might want to do anyway to get at modelling things like imported cases and spatial interaction. Definitely needs a bit more thought on exactly what that would look like though in order to support this more general case where data have different observation processes. We would also need to add the ability to specify the formula for observation process variables (i.e the overdispersion for the negative binomial) as these would definitely differ between most indicators.

There may also be other ways of doing this (the downside of this approach is it would model the scaling relationship at time of report vs latent time (i.e infection) which might not be suitable for all use cases (though this could be made more flexible). It would also require much greater control over priors than we currently have and a lot more machinery for flexible generation times (and estimating those variations) than we currently have.

Having this in place would also get us to a good place for starting to model other forms of data potential though that would definitely take quite a bit more thought as there are :dragon: there.

To really get started on this we would ideally have some example data which show these features and potentially a simple simulator that can generate a few target scenarios.