BlogTransparency · Data integrity

Why you can trust what
WaterWatch tells you

DW
Daniel Walls
Founder, WaterWatch
15 Mar 2026·8 min read
700+
Sites monitored
15min
Update frequency
150k+
Historical records
100%
Free, always

Every number you see on WaterWatch comes from one place: Thames Water's own Event Duration Monitoring sensor network. We don't estimate, extrapolate, or fill gaps. We don't apply severity scores or editorial weightings. When a sensor records a discharge, we show it. When it doesn't, we don't invent one. This sounds obvious — but in the current landscape of environmental data reporting, it's rarer than it should be.

This article explains exactly how our data pipeline works, what we do to verify it, and where the boundaries of our knowledge honestly lie. Trust shouldn't be asked for — it should be earned, and it should be specific.

Where the data actually comes from

Thames Water operates a network of Event Duration Monitoring (EDM) sensors at over 700 permitted combined sewer overflow (CSO) sites across the Thames region. These sensors detect when sewage begins discharging into waterways and when it stops. The same data is submitted to regulators including the Environment Agency — it's not a courtesy dataset, it's regulatory record.

Thames Water exposes this data through a public API. WaterWatch polls that API every 15 minutes. We capture every transition — every Start and Stop event — and store it with the precise timestamp Thames Water recorded. Nothing more, nothing less.

Direct source

WaterWatch reads directly from api.thameswater.co.uk/opendata/v2 — the same endpoint Thames Water makes available to the public and regulators. We are a presentation layer, not an interpretation layer.

The timezone problem we fixed

Early in WaterWatch's development, we discovered a significant data integrity issue — one that affects any system ingesting from Thames Water's API without careful handling. Thames Water returns timestamps in naive local time: a string like 2024-07-15T14:30:00 with no timezone suffix. During British Summer Time (BST, UTC+1), this means the actual UTC time is 13:30 — but naively treating the string as UTC stores it an hour late.

This bug — one that could easily go undetected — caused historical records to be stored with incorrect timestamps, and in some cases created phantom duplicate records. When we identified it, we didn't patch around it. We built a full remediation system that audited every single one of our 150,000+ historical records against Thames Water's API, corrected every affected timestamp, and verified the fixes programmatically.

547
Sites fully audited and verified in the remediation pass
150k+
Historical records checked against source data
6 chunks
Per site, auditing back to April 2022 in 6-month windows
0
Timezone errors remaining in live data after the fix deployment

The corrected ingestion pipeline has been live since March 2026. Every new record is converted from Thames Water's naive local time to UTC before storage. The conversion uses the IANA timezone database via JavaScript's Intl.DateTimeFormat API — meaning BST and GMT transitions are handled precisely, including edge cases at the DST boundary.

What we show and what we don't

A lot of environmental data platforms apply their own scoring, classification, or severity ratings on top of raw sensor data. WaterWatch doesn't. We surface exactly three things: whether a site is discharging right now, when it started and stopped discharging historically, and for how long each episode lasted. Duration is calculated from the raw timestamps — we don't weight it, cap it, or smooth it.

“We surface sensor limitations rather than hiding them. If a site was offline for three months, you'll see that gap. We won't fill it with estimates.”

Where sensors are offline — either because of maintenance, failure, or periods where Thames Water's API returned no data — WaterWatch shows those gaps explicitly as offline periods. They appear as distinct events in site histories. This is a deliberate choice: a gap in data is not the same as a gap in discharges, and we will not present them as equivalent.

How our pipeline works

Understanding the architecture helps you understand where our data can and can't be trusted. Here's the full chain:

StageWhat happensFrequency
TW API pollAWS Lambda fetches live discharge status for all 700+ sitesEvery 15 min
Transition detectionCompares current status to previous — records Start/Stop events when status changesEvery 15 min
Timezone conversionAll TW naive local timestamps converted to UTC before any storageEvery write
Live cacheCloudflare D1 database stores current site statuses for fast map renderingContinuous
Historical storeSupabase PostgreSQL holds all discharge events since April 2022Continuous
Integrity spot-checkAutomated Lambda samples sites and compares DB records against live TW APIContinuous

The spot-check system runs continuously in the background, sampling a random set of sites on every run and comparing our stored records against what Thames Water's API currently returns. If anything is missing, we're alerted within the hour. This isn't a quarterly audit — it's permanent, automated vigilance.

WATERWATCH · SYSTEM ARCHITECTURESOURCEThames WaterEDM APIAWS LAMBDALive IngestionEvery 15 minutesUTC conversionCF WORKERAPI & Cron5-min cronPublic API proxyCLOUDFLARE D1Live Cache700+ site statusesMap dataSUPABASEPostgreSQL150k+ eventsFull historyVERCELNext.js AppPublic interfaceINTEGRITY MONITORINGSpot-check Lambda · samples sites continuously· alerts on missing recordsLive data flowRead / cache

What we don't claim

Honesty about limitations is as important as accuracy in what we show. Here is what WaterWatch explicitly does not claim:

We don't claimWhy
Real-time to the secondOur ingestion runs every 15 minutes. A discharge that begins and ends within a 15-minute window may not be captured.
Complete pre-2022 historyThames Water's API does not reliably serve data older than April 2022. We don't backfill with estimates.
Volume or severityEDM sensors record duration, not volume. We don't convert hours into litres — that calculation requires flow rate data we don't have.
Environmental impactWe record what was discharged and for how long. Ecological impact depends on factors — dilution, river flow, species sensitivity — outside our data.
Sensor accuracyWe present what Thames Water's sensors record. If a sensor malfunctions, the data we show reflects that malfunction.

Why this matters

In 2023, the Environmental Audit Committee found that EDM sensor coverage across English water companies was incomplete and that data quality varied significantly. Thames Water has since invested in expanding its network. WaterWatch's role is not to adjudicate on that quality — it's to present it faithfully and let users draw their own conclusions.

We built WaterWatch because the gap between raw data and public understanding of sewage discharges was too large. Media coverage often cited figures that were difficult to trace to primary sources. Academic analyses were paywalled or required specialist knowledge to interpret. The data was public in theory — but not in practice.

A platform that claims to close that gap has an obligation to be rigorously honest about its own limitations. That means showing sensor gaps, not hiding them. It means publishing our methodology, not summarising it. It means fixing data integrity bugs publicly and completely, not patching them quietly.

Data methodology

All discharge events shown on WaterWatch are derived directly from Thames Water's Event Duration Monitoring (EDM) API. Timestamps are stored in UTC. No editorial weighting, severity scoring, or volume estimation is applied. Offline periods are surfaced as distinct events. Historical data begins April 2022.

If you find a discrepancy — a site showing the wrong duration, a record that doesn't match Thames Water's own data portal, anything that looks wrong — email us. Data quality is not a launch feature. It's an ongoing responsibility.

How the architecture is built

WaterWatch runs across three layers: an AWS Lambda that ingests from Thames Water every 15 minutes, a Cloudflare Worker that serves the live map and site data, and a Supabase PostgreSQL database that stores the full historical record. These are deliberately separate concerns — the ingestion layer can fail without taking down the public interface, and the Worker can be updated without touching the historical store.

What happens if Thames Water's API goes down?

This is a question worth answering precisely rather than vaguely. The short answer: nothing breaks immediately, and full correction happens within 15 minutes of the API coming back up.

The live ingestion Lambda polls Thames Water every 15 minutes. If a poll fails — whether because Thames Water's API is down, rate-limiting requests, or returning malformed data — the Lambda logs the error and exits. The Cloudflare D1 cache retains the last known status of all 700+ sites. The public map continues to serve data.

Recovery timeline

0–15 minutes of outage: No visible impact. D1 cache serves last known status.

15+ minutes: The status page timestamp shows data age — users can see it's stale.

On API recovery: The next Lambda invocation succeeds. Within 15 minutes, all sites are updated and any missed transitions are captured.

Full correction: Complete within one ingestion cycle (≤15 minutes) of the API recovering.

API OUTAGE · RECOVERY TIMELINETW API DOWNMap serves last known statusfrom D1 cacheNo new records writtenAPI RECOVERSRECOVERYNext Lambda cycle runs(≤15 min)All missed events capturedt=0Outage beginst+15minStatus shows stalet+NAPI backt+N+15minFull correctionNormal ops

If something goes wrong in our pipeline, how long does correction take?

Different failure modes have different correction windows. Here's the honest breakdown:

Failure typeDetection timeCorrection time
Lambda ingestion failure (single cycle)Immediate — CloudWatch alarm≤15 min on recovery
Timestamp conversion bug (new code)Minutes — spot-check Lambda samples sites continuouslyFix deployed + remediation run: typically 4–8 hours for all 547 sites
Missing records (ingestion gap)Minutes — spot-check flags missing records against live TW APIManual catch-up run: 1–2 hours
D1 cache stale (Worker cron failure)5 min — /status endpoint timestamp goes staleNext successful cron: ≤5 min
Thames Water API data correctionHours — spot-check detects mismatchNext remediation run: 4–8 hours

Sources: Thames Water EDM API (api.thameswater.co.uk/opendata/v2) · Environment Agency Event Duration Monitoring data · Water Industry Act 1991, Schedule 22 (EDM obligations)

DW
Written by
Daniel Walls
Founder · WaterWatch

Daniel is a 17-year-old sixth form student and the sole developer behind WaterWatch. He built the platform after becoming frustrated by the gap between publicly available sewage discharge data and how that data was being reported. WaterWatch is independent, free, and has no commercial relationships with Thames Water or any water industry body.

daniel@water-watch.co.uk
More from the blog
Data
How Event Duration Monitoring actually works
Coming soon
Policy
What the Storm Overflow Discharge Reduction Plan means in practice
Coming soon
Methodology
Reading a CSO site page: a guide for journalists and researchers
Coming soon