Transform and aggregate scoring rules

samabbott · 16 February 2026 18:38

I was just taking a look at Pic et al. 2025 again and this time engaged more beyond noting it would have been nice to see them cite @nikosbosse more ( ).

Claude dug out a bunch of references, some of which I was aware of and some of which I was not. It reminded me of this idea of composing scores together to achieve custom scores for different settings that I rater like.

There are a few papers here that I mean to take a look at but I was wondering if anyone had any thoughts and, if sufficient people take a look, we should have a call about any actions to take in terms of updating our practice.

The claude issue is here with links out (who knows maybe some are dreams): Support aggregation-and-transformation scoring framework for multivariate forecasts · Issue #1120 · epiforecasts/scoringutils · GitHub

This was motivated by looking through literature to implement variogram scores on prompting from @nickreich (Implement variogram scores · Issue #1111 · epiforecasts/scoringutils · GitHub).

samabbott · 16 February 2026 18:38

(This all relates to ideas in Do your evaluations have enough power? - #17 by samabbott about having a common standard of best practice with some clear mechanism for iterating on it). Currently, that is in some sense via software but of course not completely (i.e how you aggregate etc etc).

samabbott · 16 February 2026 18:50

Also if we agree with their reasoning it suggests the multiple stage method proposed for Baseball Stats, Model Cards, and Forecasting Performance would be a good ideas it fits in nicely with this.

samabbott · 24 February 2026 12:19

The pareto analysis conducted in Baseball Stats, Model Cards, and Forecasting Performance - #16 by samabbott by @jack @mariatang and @jonathon.mellor uses a compound scoring rule tuned to the target audience as advocated for in Allen et al. which is interesting.

Topic		Replies	Views
So I want to score related things in scoringutils?	1	32	24 September 2025
Design sketch: Julia equivalents of scoringRules and scoringutils — feedback welcome Project Proposals	4	18	18 February 2026
Scoring best practice: Should we always have scoring simulations in our papers?	5	26	27 April 2026
Do your evaluations have enough power?	27	281	17 February 2026
Baseball Stats, Model Cards, and Forecasting Performance Project Proposals	17	272	11 March 2026

Transform and aggregate scoring rules

Related topics