I really enjoyed this recent preprint on hurricane forecasting from Google folks: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/how-we-re-supporting-better-tropical-cyclone-prediction-with-ai/skillful-joint-probabilistic-weather-forecasting-from-marginals.pdf
aside from the interesting modelling etc what really struck me was the similar but different probablistic forecasting metrics that they are using to evaluate their model. This relates to Do your evaluations have enough power? - #4 by samabbott in that it is clear other fields have different standards of practice.
In particular I liked their different approaches for pooling CRPS and their calibration measure (centred on 1).
They also talk about a “scorecard” which is very ML and reminded me of this: Baseball Stats, Model Cards, and Forecasting Performance
I was wondering about seeing if they might like to speak somewhere about this - epinowcast might not be a goer but perhaps we could get them to come and give a talk at LSHTM if they are up the road at Kings cross?
