I built WaterWatch because I was annoyed. Thames Water publishes live data from their sewage overflow sensor network — and plenty of people read it. They just use it wrong. A raw discharge count without historical context, without weather control, without a baseline, tells you almost nothing. So I built the context. Here is everything I found, and exactly how the platform that came out of it works.
Where the data comes from
Thames Water operates EDM sensors — Event Duration Monitors — at their combined sewer overflow sites across their network. Each sensor does one thing: it records the moment a discharge starts and the moment it stops. That is it. Simple, but powerful.
Thames Water makes this data available via a public API. WaterWatch polls that API every 15 minutes, around the clock, across all sites simultaneously. Every discharge event gets logged — start time, stop time, duration — and stored in a database that now goes back to April 2022. As of today, that is over 77,000 confirmed discharge episodes across around 500 sites.
Duration and timing. The sensor tells you a discharge was active from 14:32 to 17:08 on Tuesday. It does not tell you how much sewage flowed, how far it travelled, or what was in it. Every number on WaterWatch is derived from duration data alone — we do not estimate volumes, pollution loads, or ecological impact.
How episodes are detected and cleaned
Raw sensor data has noise. Sensors occasionally produce brief apparent activations during maintenance, telemetry glitches, or API hiccups. Short activations under a minimum threshold are flagged as likely noise and not counted as genuine discharge episodes. If a sensor briefly goes offline mid-discharge and comes back, the records are stitched together rather than counted as two separate events.
Sensor gaps — periods where the sensor was offline entirely — are tracked separately and shown explicitly on site pages. A gap in the data is not treated as a period of no-discharge. It is treated as unknown. We think this distinction matters a lot, and it is one most media coverage of this data misses entirely.
“77,000 discharge episodes. Each one is a specific moment when sewage entered a specific river. That is not an estimate. That is a timestamp.”
Why raw spill counts aren't enough
Here is the problem that makes sewage data analysis genuinely hard: CSOs are designed to overflow when it rains. A dry year always produces fewer spills. A wet year always produces more. If you compare two years without accounting for rainfall, you are comparing the weather, not the infrastructure.
This is why WaterWatch built an environmental pressure scoring model. Instead of asking “how many times did this site overflow?”, we ask “how many times did it overflow relative to how much pressure it was under?” A site that overflowed 40 times in an exceptionally wet year might actually be performing better than a site that overflowed 30 times in a dry one.
The environmental pressure score
Rainfall comes from the nearest Environment Agency rain gauge to each site — the same network used for flood warnings. River level data comes from EA river monitoring stations matched to each outfall. The AWI (Antecedent Wetness Index) is the interesting one.
AWI was first formalised by Kohler and Linsley in 1951, originally for flood forecasting. The idea is that soil has memory: rain that fell last week still affects how much of today's rain runs off rather than soaking in. WaterWatch uses a daily decay model — each day, the previous day's wetness carries over at 85% of its value and that day's rainfall is added. After a dry spell, AWI drops toward zero. After two weeks of rain, even a modest shower produces significant runoff — and the pressure score reflects that.
A site that overflows every time it rains at all has chronically saturated catchment soil — its activation threshold is effectively zero. AWI captures this. A site that only overflows after sustained multi-day rainfall events has a higher effective threshold. The pressure score makes comparisons across different weather windows meaningful.
The improvement analysis tool
We built an internal research tool for analysing whether a specific infrastructure upgrade actually changed a site's discharge behaviour. You give it a site and an upgrade date; it compares spill rates per 100 pressure units across a 24-month window before and after, controlling for weather. It flags data quality problems automatically — including the case where a rain gauge had gaps that make the normalised rate misleading.
This tool is not public-facing — it is used to produce verified, data-backed articles about specific upgrades. If you are a journalist or researcher interested in running an analysis on a particular site, get in touch. The full methodology is explained in a separate post.
What we honestly don't know
A platform like this only earns trust if it is honest about its limits. There are several things WaterWatch cannot tell you, and will not pretend to.
We do not know the volume of sewage that discharged — only the duration. A 12-hour discharge from a large combined sewer could represent vastly more volume than a 24-hour discharge from a small rural overflow. Duration alone is not a proxy for harm.
We do not know the ecological impact. That depends on river flow, dilution ratio, water temperature, species composition, cumulative discharges from upstream sites, and agricultural runoff — none of which we have at a per-episode level. A reduction in spill hours does not automatically mean the river is healthier. It means one specific source of pollution reduced.
We do not know if the sensors are calibrated correctly. We report what they report. If a sensor records a longer or shorter duration than the actual overflow, we have no way to detect that from the data alone. Sensor gaps are tracked and surfaced explicitly; missing data is never shown as zero.
WaterWatch does not state that a site has “improved” based on a single year's data. Year-on-year spill hour comparisons without rainfall context are nearly meaningless, and we treat them accordingly. The site pages show the raw data and let you form your own view. We are building toward a longer baseline before making confident trend assessments. More on the methodology: how we determine if a site is improving.
Independence
WaterWatch has no commercial relationship with Thames Water, the Environment Agency, or any water industry body. The platform is funded by cloud credits, a free CDN plan, and personal time. There is no editorial pressure and no advertiser with a stake in how CSO data is presented.
Every number on WaterWatch comes from Thames Water's own publicly available sensor data — the same data they are required to publish. We do not apply editorial weighting, we do not average out inconvenient spikes, and we do not apply smoothing that would make the data look less alarming. We show what the sensors report.
That said: we are one developer and a small proprietary database. If you spot something that looks wrong — a site showing discharging when you know the sensor is offline, a gap that shouldn't be there, a number that doesn't match official reporting — please tell us. The address is hello@water-watch.co.uk. We will look into it and be transparent about what we find.
All data on WaterWatch is publicly available and can be cited. If you are working on a story and want to verify a figure, request a specific analysis, or understand a particular site's history, get in touch. We can provide the underlying data and a walk-through of how any given number was derived.
Why this is built by a 17-year-old in sixth form
Because nobody else was doing it. The data existed. The API was public. The tools were available. What was missing was the time and frustration to actually sit down and build it. I had both.
WaterWatch started as a personal project to answer one question: which CSO sites near me are the worst? It turned into a 500-site network monitor, an environmental pressure scoring model, a subscriber alert system, and a research tool for forensically examining whether upgrades actually work. Every methodology is documented, and all of it is free.
If you are a journalist, a researcher, an activist, or someone who swims in a Thames tributary and would like to know what is going into it upstream — this platform is for you. The data belongs to everyone. We just made it readable.
Data sources: Thames Water EDM open data API · Environment Agency rainfall monitoring network · EA river level monitoring · Defra Storm Overflow Discharge Reduction Plan (2022)
Related: How we determine if a site is improving · How we analyse infrastructure upgrades