← Intelligence

Model Scrutiny

How well do WaterWatch's spill-probability predictions hold up against real outcomes? Precision, recall and F1 computed from evaluated predictions where a spill either occurred or didn't.

Accuracy Over Time

Weekly precision, recall and F1 for the last 12 weeks

Performance by Behavioural Cluster

Sites are grouped into clusters based on their spill behaviour. Some clusters are inherently harder to predict.

Highest False-Positive Rate

Top 10 sites where the model over-predicts spills (≥ 20 evaluated predictions)

Best F1 Score

Top 10 sites with the highest balanced accuracy (≥ 20 evaluated predictions)

Methodology

Each prediction is classified as a true positive (predicted spill, spill occurred), false positive (predicted spill, no spill), false negative (no prediction, spill occurred), or true negative. Only predictions where the outcome is known (episode_occurred IS NOT NULL) are included. A minimum of 20 evaluated predictions is required for per-site metrics. Metrics are updated hourly. See the Intelligence dashboard for the full network view.