How well do WaterWatch's spill-probability predictions hold up against real outcomes? Precision, recall and F1 computed from evaluated predictions where a spill either occurred or didn't.
Weekly precision, recall and F1 for the last 12 weeks
Sites are grouped into clusters based on their spill behaviour. Some clusters are inherently harder to predict.
Top 10 sites where the model over-predicts spills (≥ 20 evaluated predictions)
Top 10 sites with the highest balanced accuracy (≥ 20 evaluated predictions)
Each prediction is classified as a true positive (predicted spill, spill occurred), false positive (predicted spill, no spill), false negative (no prediction, spill occurred), or true negative. Only predictions where the outcome is known (episode_occurred IS NOT NULL) are included. A minimum of 20 evaluated predictions is required for per-site metrics. Metrics are updated hourly. See the Intelligence dashboard for the full network view.