Epinowcast - Filtering by earliest observed report date - separate function

jimrothstein · 8 January 2024 06:47

enw_add_incidence()

Filtering by earliest observed report date as separate function

opened 05:15PM - 09 Aug 23 UTC

enhancement good first issue low-priority

As discussed in #224: The part ``` reports <- reports[, .SD[reference_…date >= min(report_date) | is.na(reference_date)], by = by ] ``` in `enw_add_incidence()` filters by earliest observed report date. This isn't super nice as it is rather like an unexpected side effect that is not directly related to adding incidence/diff_obs to the obs. It might be better to move this out of enw_add_incidence, into a separate function.

I am working on this “good first issue”. One line of existing code is in question:

reports <- reports[,
    .SD[reference_date >= min(report_date) | is.na(reference_date)],
    by = by
  ]

I’m reading to get up to speed, but various dates have me a little confused.
My understanding is reference date is date of first positive test for specifc individual. Wouldn’t it be a data error if report_date came BEFORE the reference date?

Likewise, if is.na(reference_date) is TRUE this, too, seems like data problem.
Can you refer me to sample data examples so I can understand the issues better?
Thx.

Thanks.

samabbott · 4 April 2024 17:38

Thanks for this @jimrothstein and sorry it took so long to get back to you! For others this is now being addressed in ISSUE 305: first attempt to put reference_date >= min(report_date) into separate… by jimrothstein · Pull Request #430 · epinowcast/epinowcast · GitHub

My understanding is reference date is date of first positive test for specifc individual. Wouldn’t it be a data error if report_date came BEFORE the reference date?

So it is possible in retrospective aggregate count datasets which many users will have. I think in this instance this is a correction to catch that.

Likewise, if is.na(reference_date) is TRUE this, too, seems like data problem.
Can you refer me to sample data examples so I can understand the issues better?
Thx.

This is because we are using aggregate counts and not individual-level data and so it is possible that some people are missing reference dates (and this is then something we support the modelling of). To get a handle on this I suggest looking at the package vignettes or the example scripts in inst/examples.

jimrothstein · 4 April 2024 22:56

I defer to the others. But if something here or elsewhere I can do, please let me know.
jim

Topic		Replies	Views
Handling negative delays between reference and report Developers model-extension	1	256	29 September 2022
Handling delayed entry of symptom onset dates in line lists Developers model-extension	9	644	8 February 2023
Adapting `{epinowcast}` to have a fixed reporting schedule	0	164	14 September 2023
Nowcasting in a real-time analysis pipeline to estimate the effective reproduction number with missing data and reporting delay Publicity simulated-case-study , effective-reproducti , missing-data	0	396	24 August 2022
Data management recommendations for nowcasting Projects	12	567	7 October 2022

Epinowcast - Filtering by earliest observed report date - separate function

Related topics