I built WaterWatch because I was annoyed. Thames Water publishes live data from their sewage overflow sensor network — and almost nobody was reading it. So I started reading it. Here is everything I found, and exactly how the platform that came out of it works.

Where the data comes from

Thames Water operates EDM sensors — Event Duration Monitors — at their combined sewer overflow sites across their network. Each sensor does one thing: it records the moment a discharge starts and the moment it stops. That is it. Simple, but powerful.

Thames Water makes this data available via a public API. WaterWatch polls that API every 15 minutes, around the clock, across all sites simultaneously. Every discharge event gets logged — start time, stop time, duration — and stored in a database that now goes back to April 2022. As of today, that is over 77,000 confirmed discharge episodes across around 500 sites.

What EDM sensors actually measure

Duration and timing. The sensor tells you a discharge was active from 14:32 to 17:08 on Tuesday. It does not tell you how much sewage flowed, how far it travelled, or what was in it. Every number on WaterWatch is derived from duration data alone — we do not estimate volumes, pollution loads, or ecological impact.

How episodes are detected and cleaned

Raw sensor data has noise. Sensors occasionally produce brief apparent activations during maintenance, telemetry glitches, or API hiccups. Short activations under a minimum threshold are flagged as likely noise and not counted as genuine discharge episodes. If a sensor briefly goes offline mid-discharge and comes back, the records are stitched together rather than counted as two separate events.

Sensor gaps — periods where the sensor was offline entirely — are tracked separately and shown explicitly on site pages. A gap in the data is not treated as a period of no-discharge. It is treated as unknown. We think this distinction matters a lot, and it is one most media coverage of this data misses entirely.

“77,000 discharge episodes. Each one is a specific moment when sewage entered a specific river. That is not an estimate. That is a timestamp.”

Why raw spill counts aren't enough

Here is the problem that makes sewage data analysis genuinely hard: CSOs are designed to overflow when it rains. A dry year always produces fewer spills. A wet year always produces more. If you compare two years without accounting for rainfall, you are comparing the weather, not the infrastructure.

This is why WaterWatch built an environmental pressure scoring model. Instead of asking “how many times did this site overflow?”, we ask “how many times did it overflow relative to how much pressure it was under?” A site that overflowed 40 times in an exceptionally wet year might actually be performing better than a site that overflowed 30 times in a dry one.

The environmental pressure score

Environmental pressure score

pressure = rain_mm + (AWI × 0.6) + (river_anomaly × 0.4)

Rain (mm)

24-hour rainfall total from the nearest Environment Agency gauge. Direct driver of sewer surcharge.

AWI × 0.6

Antecedent Wetness Index — soil has memory. Rain that fell last week still affects how much runoff today's rain produces.

River anomaly × 0.4

How far above normal the local river is running. A swollen river backpressures the outfall and makes overflow more likely.

Rainfall comes from the nearest Environment Agency rain gauge to each site — the same network used for flood warnings. River level data comes from EA river monitoring stations matched to each outfall. The AWI (Antecedent Wetness Index) is the interesting one.

AWI was first formalised by Kohler and Linsley in 1951, originally for flood forecasting. The idea is that soil has memory: rain that fell last week still affects how much of today's rain runs off rather than soaking in. WaterWatch uses a daily decay model — each day, the previous day's wetness carries over at 85% of its value and that day's rainfall is added. After a dry spell, AWI drops toward zero. After two weeks of rain, even a modest shower produces significant runoff — and the pressure score reflects that.

AWI in practice

A site that overflows every time it rains at all has chronically saturated catchment soil — its activation threshold is effectively zero. AWI captures this. A site that only overflows after sustained multi-day rainfall events has a higher effective threshold. The pressure score makes comparisons across different weather windows meaningful.

The improvement analysis tool

We built an internal research tool for analysing whether a specific infrastructure upgrade actually changed a site's discharge behaviour. You give it a site and an upgrade date; it compares spill rates per 100 pressure units across a 24-month window before and after, controlling for weather. It flags data quality problems automatically — including the case where a rain gauge had gaps that make the normalised rate misleading.

This tool is not public-facing — it is used to produce verified, data-backed articles about specific upgrades. If you are a journalist or researcher interested in running an analysis on a particular site, get in touch. The full methodology is explained in a separate post.

What we honestly don't know

A platform like this only earns trust if it is honest about its limits. There are several things WaterWatch cannot tell you, and will not pretend to.

We do not know the volume of sewage that discharged — only the duration. A 12-hour discharge from a large combined sewer could represent vastly more volume than a 24-hour discharge from a small rural overflow. Duration alone is not a proxy for harm.

We do not know the ecological impact. That depends on river flow, dilution ratio, water temperature, species composition, cumulative discharges from upstream sites, and agricultural runoff — none of which we have at a per-episode level. A reduction in spill hours does not automatically mean the river is healthier. It means one specific source of pollution reduced.

We do not know if the sensors are calibrated correctly. We report what they report. If a sensor records a longer or shorter duration than the actual overflow, we have no way to detect that from the data alone. Sensor gaps are tracked and surfaced explicitly; missing data is never shown as zero.

On improvement claims

WaterWatch does not state that a site has “improved” based on a single year's data. Year-on-year spill hour comparisons without rainfall context are nearly meaningless, and we treat them accordingly. The site pages show the raw data and let you form your own view. We are building toward a longer baseline before making confident trend assessments. More on the methodology: how we determine if a site is improving.

Independence

WaterWatch has no commercial relationship with Thames Water, the Environment Agency, or any water industry body. The platform is funded by AWS credits, a Cloudflare free tier, and personal time. There is no editorial pressure and no advertiser with a stake in how CSO data is presented.

Every number on WaterWatch comes from Thames Water's own publicly available sensor data — the same data they are required to publish. We do not apply editorial weighting, we do not average out inconvenient spikes, and we do not apply smoothing that would make the data look less alarming. We show what the sensors report.

That said: we are one developer and a free-tier database. If you spot something that looks wrong — a site showing discharging when you know the sensor is offline, a gap that shouldn't be there, a number that doesn't match official reporting — please tell us. The address is hello@water-watch.co.uk. We will look into it and be transparent about what we find.

For journalists and researchers

All data on WaterWatch is publicly available and can be cited. If you are working on a story and want to verify a figure, request a specific analysis, or understand a particular site's history, get in touch. We can provide the underlying data and a walk-through of how any given number was derived.

Why this is built by a 17-year-old in sixth form

Because nobody else was doing it. The data existed. The API was public. The tools were available. What was missing was the time and frustration to actually sit down and build it. I had both.

WaterWatch started as a personal project to answer one question: which CSO sites near me are the worst? It turned into a 500-site network monitor, an environmental pressure scoring model, a subscriber alert system, and a research tool for forensically examining whether upgrades actually work. Every methodology is documented, and all of it is free.

If you are a journalist, a researcher, an activist, or someone who swims in a Thames tributary and would like to know what is going into it upstream — this platform is for you. The data belongs to everyone. We just made it readable.

Don't just eyeball the map — use alerts

The most common misuse of WaterWatch is checking once, seeing a red status, and treating that snapshot as the whole verdict. “Site X is spilling” is a moment in time, not a story. Without duration, history, or rainfall context, it's the sewage-data equivalent of judging someone's health by their one bad photo.

Better move: use alerts. WaterWatch alerts tell you when a site starts and stops discharging, so you follow the full episode rather than catching one dramatic frame. Set an alert for your local site and let the data come to you — your refresh key has earned a rest.

Recommended workflow

1) Check the live status. 2) Check the recent event history and duration trend. 3) Set up alerts at /alerts so you get start/stop notifications automatically.

That gives you context, chronology, and fewer bad takes per minute.

Data sources: Thames Water EDM open data API · Environment Agency rainfall monitoring network · EA river level monitoring · Defra Storm Overflow Discharge Reduction Plan (2022)

How WaterWatch actually works — and why you can trust it