@FelixGuenther drew our attention to a potential idiosyncrasy of line list data that he came across during his nowcasting work: it may be the case that the reference date (e.g. symptom onset date) is entered “retrospectively”, i.e. a case may be reported with missing symptom onset date first, and the symptom onset date is only added later.

If this happens on a regular basis, the reporting triangle (including missing delay as a special cell) is not stable over time. Essentially, this means that the share of cases with missing reference date depends on the date of report, as cases closer to the present have a higher probability to be (still) without symptom onset date.

This has direct implications for the missingness model envisioned / almost finalized for epinowcast, where we model the share of cases with missing symptom onset only conditional on the date of report. If there is delayed entry of reference dates, then even if the eventual share of missing cases was constant over time, we would observe an increase in the share of missing cases towards the present, both by date of report and by date of reference.

For the current missingness model, this means two things

- A stationary time series prior on the share of missing cases by date of reference would lead to an underestimation of missingness towards the present. A time series prior with trend could reduce/avoid this bias, however we have not yet discussed which type of trend (linear trend on the logit scale?) would be most suited.
- Compared to imputing missing symptom onset dates by estimating the backward delay distribution, the generative missingness model would be less precise because it depends on modeling the share of missing cases by date of reference over time and cannot condition on the date of report. Of course, the estimation of backward delays also has its own challenge (dependence on the epidemic curve), so it is unclear what would be more precise overall.

The above points apply to a situation in which we just have the reporting triangle as data. If we can get additional data about the delay with which reference dates are recorded, we could also consider extending the nowcasting framework to several “dates of report” (e.g. date of report 1 = reporting of case, date of report 2 = reporting of reference date). This modeling of “higher-dimensional” reporting triangles is also closely connected to other idiosyncrasies of line list data, such as retrospective deletion of cases / editing of dates. See also our discussion here.

Some take-aways from this discussion:

- Simulations / having real data with the described data generation process would be very helpful to study the issue in more detail
- Adjustments / a non-stationary time series prior for the missingness model in epinowcast may be sensible
- There could be a lot of value in modeling of higher-dimensional reporting “triangles” - if we have appropriate data (and computing resources).