Do your evaluations have enough power?

In terms of standardising things this is where the idea for doing someting like Baseball Stats, Model Cards, and Forecasting Performance was coming from and I still think that would be neat/acheivable

1 Like