Baseball Stats, Model Cards, and Forecasting Performance

Hmm I guess the replacement is the alternative baseline model I might submit. The issue you have if you don’t adjust for this you end up conflating the value of a given model with the value of an ensemble just having more random models that represent some error distribution i.e. how many models like analyses.

Also I would lightly push back on the idea that models are “unique” as often quite derivative of each other right especially in larger hubs. This replacement idea also I think starts maybe getting at that a bit as it adjusts for model weight in some sense to give the uniqueness value (or lol maybe it doesn’t who knows).