So in essence the aim is to estimate the relationship between two indicators where this may be both a delay and a scaling in the presence of right truncation for both indicators? This has definitely come up a few times in my work and all the methods I have seen are generally designed for idealised data or really just very simplistic (i.e. taking external delay estimates and convolving between two time-series is a common “solution” to this issue but I think one that isn’t really enough for robust real-world usage or the more complex version where you also estimate the delay (Estimate a Secondary Observation from a Primary Observation — estimate_secondary • EpiNow2 & GitHub - epiforecasts/idbrms: Population-level infectious disease modelling as an extension of brms. are examples)).

I think we could do something like this with the `epinowcast`

framework, and get a lot of benefits from our existing machinery, if we added a linkage between strata. In the simple case that could look like,

S_t = \sum^D_{d=0}\alpha_{t-d} f_{t-d}(d) P_{t-d}

Where P_t is the primary indicator, S_t is the secondary indicator, \alpha is some scaling factor (i.e the CFR in the case of cases and deaths, and f(\tau) is some probability mass function.

Then both could be right truncation adjusted/nowcasted within the framework of the package.

At the moment we use a renewal equation (or some simplification thereof) and latent delays as our generative process. That looks something like this (see Model definition and implementation • epinowcast for more detail here) where \lambda_{g,t} would be both S_t and P_t as defined above (i.e they would be separate strata).

\lambda^l_{g,t} = R_{g,t} \sum_{p = 1}^{P} G_{g}\left(p, t - p \right) \lambda^l_{g, t-p}

\lambda_{g,t} = \nu_{g,t} \sum_{\tau = 0}^{L - 1} F_{g}\left(\tau + 1, t - \tau \right) \lambda^l_{g, t - \tau}

\nu_{g,t} is already acting as a scaling here (though we potentially want to rethink this so that it impacts the hazard of F as suggested by @adrianlison) for individual variables (i.e as some kind of CFR etc. depending on the setting and R_t can be specified jointly. A potential option would be to try and model the relationship between variables via the generation time (as this is just indicating a relationship between current and past \lambda). This is something we might want to do anyway to get at modelling things like imported cases and spatial interaction. Definitely needs a bit more thought on exactly what that would look like though in order to support this more general case where data have different observation processes. We would also need to add the ability to specify the formula for observation process variables (i.e the overdispersion for the negative binomial) as these would definitely differ between most indicators.

There may also be other ways of doing this (the downside of this approach is it would model the scaling relationship at time of report vs latent time (i.e infection) which might not be suitable for all use cases (though this could be made more flexible). It would also require much greater control over priors than we currently have and a lot more machinery for flexible generation times (and estimating those variations) than we currently have.

Having this in place would also get us to a good place for starting to model other forms of data potential though that would definitely take quite a bit more thought as there are there.

To really get started on this we would ideally have some example data which show these features and potentially a simple simulator that can generate a few target scenarios.