Relative score aggregation - harmonic mean?

Hello,

Bit niche but I had a question about the way we aggregate forecast relative skill scores, and hoped people here would be well placed to help. (TLDR:) I wondered if the harmonic mean might be a useful average for summarising relative, proportional scores (i.e. relative WIS)?

(TL) background:

When evaluating forecast skill, sometimes we want to summarise forecast scores across an unbalanced sample of participant forecasters, across many different targets. We have been handling this by using the relative skill score. This compares individual forecaster scores each relative to another, for a particular forecast target, and then collates these for each forecaster and takes the geometric mean.

If* I understand correctly, when we get to summarising again across multiple forecasters’ relative scores, the method of aggregation isn’t obvious. Taking either arithmetic or geometric mean across relative scores, means averaging across ratios that are unbalanced (with potentially different sample size underlying each ratio). This might lose the propriety of the resulting summary. [*big if]

I was reading about the harmonic mean in a different context recently and wondered if we could use this to help. It better summarises across a set of proportions, by taking the reciprocal of the arithmetic mean of reciprocals.

I saw it explained by analogy to summarising “average speed” of a vehicle that varies its speed across lengths of the same distance (e.g. 20mph for 2 miles, 100mph for the next 2 miles). The arithmetic mean = (20mph + 100mph) / 4 miles; the harmonic mean = 4 miles / (1/20 + 1/20 + 1/100 + 1/100). It’s also used in finance, for averaging across an unbalanced portfolio of different companies’ price/earnings multiples.

This all reminded me of the issue of averaging relative skill scores, and I wondered if the harmonic mean might be a neat way to summarise across relWIS.