Do your evaluations have enough power?

This is a great question - my sense from all the hub etc. work is that it’s likely underpowered but it would be great to think about this a bit more, and even more so to develop some guidance on how to address this when reporting forecast scores.

One thing that I think we discussed in the past was the idea of Model Confidence Sets, i.e. sets of models that are indistinguishable in their forecast ability - there seems to be some active work on this, with applications to COVID forecasts in Sequential model confidence sets and to forecasts during particular phases in Conditional model confidence sets.

2 Likes