Transform and aggregate scoring rules

I was just taking a look at Pic et al. 2025 again and this time engaged more beyond noting it would have been nice to see them cite @nikosbosse more (:smile: ).

Claude dug out a bunch of references, some of which I was aware of and some of which I was not. It reminded me of this idea of composing scores together to achieve custom scores for different settings that I rater like.

There are a few papers here that I mean to take a look at but I was wondering if anyone had any thoughts and, if sufficient people take a look, we should have a call about any actions to take in terms of updating our practice.

The claude issue is here with links out (who knows maybe some are dreams): Support aggregation-and-transformation scoring framework for multivariate forecasts · Issue #1120 · epiforecasts/scoringutils · GitHub

This was motivated by looking through literature to implement variogram scores on prompting from @nickreich (Implement variogram scores · Issue #1111 · epiforecasts/scoringutils · GitHub).

(This all relates to ideas in Do your evaluations have enough power? - #17 by samabbott about having a common standard of best practice with some clear mechanism for iterating on it). Currently, that is in some sense via software but of course not completely (i.e how you aggregate etc etc).

1 Like

Also if we agree with their reasoning it suggests the multiple stage method proposed for Baseball Stats, Model Cards, and Forecasting Performance would be a good ideas it fits in nicely with this.

The pareto analysis conducted in Baseball Stats, Model Cards, and Forecasting Performance - #16 by samabbott by @jack @mariatang and @jonathon.mellor uses a compound scoring rule tuned to the target audience as advocated for in Allen et al. which is interesting.

1 Like