Do your evaluations have enough power?

samabbott · 27 January 2026 14:24

Yes, I totally agree with this but I also don’t trust the vast majority of evaluation that is done at the moment that looks at lots of different strata in the data in an ad-hoc way so my threshold for thinking we should use a model is much lower.

I need to engage with the MCS literature more but on a first pass I assume it can also be represented in a model framework which would be handy as again it means you have just a single set of tools to learn/develop best practices for.

Yup this is the big problem right - depending on how you set them up and as you say one of the arguments @nikosbosse gave for why a transformed score is nice.

Again something, I wonder is how often we have similar problems when we are reasoning about a model’s performance by i.e. location and horizon using graphs etc. It would be interesting to try and unpick if the model setup just makes a more common problem obvious.

Topic		Replies	Views
Community Seminar 2024-08-07 - Kaitlyn Johnson - Wastewater modeling to forecast hospital admissions in the US: Challenges and opportunities Meetings	19	250	14 August 2024
Baseball Stats, Model Cards, and Forecasting Performance Project Proposals	17	473	11 March 2026
Scoring best practice: Should we always have scoring simulations in our papers?	5	80	27 April 2026
How can collaborative infectious disease forecasting/nowcasting projects be improved?	6	550	5 June 2023
A basket of baselines Project Proposals	15	208	27 January 2026

Do your evaluations have enough power?

Related topics