2018-05-01

Totalised Histories: Deltas, Rollovers, and the Read Pipeline

The numbers a totalised meter persists aren't the numbers a chart shows. Between them sits a read pipeline, and most of the bugs live there.

TimeseriesData QualitySkySpark

A meter reads 1,050 kWh at 09:00 and 1,200 kWh at 10:00. How much energy was consumed between those readings, and when was it consumed? The first answer is 150 kWh. The second is more interesting than it looks. Get it wrong and your hourly chart shifts an hour to the right, and sooner or later someone will wonder why their data's an hour off.

Reading isn't writing

It helps to keep a clear separation in your mind between the data that's persisted and the data that's returned from a query. Unless you specifically query the persisted history, what you get back has likely been through a transformation pipeline. One such example is folding and interpolation; others include unit conversion, data quality, and shifting timestamps. Holding that separation in your head is what makes time series weirdness debuggable when it shows up.

Read pipeline

What gets persisted is rarely what comes back. The read pipeline reshapes it on the way out.

Persistedon disk
Transformations
Returnedto caller

All of those transformations rest on a single rule about timestamps: each one marks the start of a period. A reading at 10:00 with a five-minute interval is the value valid from 10:00 (inclusive) up to 10:05 (exclusive). Different databases have different conventions (some put the timestamp at the start of the period, others at the end), but this article assumes start-of-period throughout. Every other rule below is downstream of that one.

For an instantaneous reading like a temperature probe sampled every five minutes, that semantics fits naturally. A value of 15°C at 10:00 means the probe read 15°C from 10:00 until 10:05. Folding three of those into a fifteen-minute average gives you the average temperature over the window. No tricks.

Totalised consumption

Energy meters work differently. Most are totalised: they store a continuously incrementing reading rather than a per-period delta. The database turns that into consumption by computing the difference between consecutive readings, typically at read time. Sum the raw values and you'd get an absurd number; sum the deltas and you get total consumption.

The complexity is in the timestamp. Going back to our example at the top: 1,050 kWh at 09:00 and 1,200 kWh at 10:00. The delta of 150 kWh is the energy consumed between those two readings. The natural place to attach it is at the timestamp where the difference was computed, 10:00. But the start-of-period rule says a value at 10:00 represents the period starting at 10:00, not the one ending there. Left alone, the consumption shows up an hour late.

The fix is for the read pipeline to shift each delta back by one period before returning it. The 150 kWh now lives at 09:00, where it belongs. This shift is usually opt-in, controlled by a flag on the point that marks the history as consumption rather than instantaneous readings.

Read pipeline

The same five raw readings, viewed through each layer of the read pipeline. Tap a stage to reshape the chart and highlight its column.

What's actually persisted on disk. A continuously incrementing meter reading.

TimePersistedCalculationDeltaShifted
08:001,000 kWh50 kWh
09:001,050 kWh1,050 − 1,00050 kWh150 kWh
10:001,200 kWh1,200 − 1,050150 kWh100 kWh
11:001,300 kWh1,300 − 1,200100 kWh150 kWh
12:001,450 kWh1,450 − 1,300150 kWh
If the deltas are already shifted when they reach storage, the read pipeline will shift them a second time. Every value lands a period off. The bug typically looks like a timezone problem until someone realises it isn't.
Storing data this way comes with a side benefit that doesn't always get advertised. Imagine an outage between 09:00 and 12:00. With sampled deltas you'd lose three hours of consumption forever. With a totalised meter, the next reading at 12:00 still includes everything that happened during the outage. You lose the profile, but you keep the magnitude.

Rollovers

The persisted reading only climbs. With enough time and enough use, every counter eventually runs out of room.

Analogue meters have a finite range. When the counter reaches its maximum it rolls over and starts again at zero, the way an old car odometer flips from 999,999 back to 000,000. The next read sees a value smaller than the previous one and the delta computation produces a negative number.

0
0
0
counting

Digital meters inherit the same problem in a different form. The values just look weirder. The rollover happens at the limits of fixed-width integer storage rather than at decimal digit limits, typically 65,535 (16 bits) or 4,294,967,295 (32 bits). Whenever you see a history roll over at one of those values, you're looking at integer overflow rather than a digit wheel.

In a normal period the delta between two readings is the straightforward difference:

delta=currentprevious\text{delta} = \text{current} - \text{previous}

If we know the value at which the meter rolls over, we can capture it as part of the meter's configuration and treat any negative delta as a rollover rather than a fresh start. Across a rollover, the formula becomes:

delta=(rolloverprevious)+current\text{delta} = (\text{rollover} - \text{previous}) + \text{current}

That recovers the consumption that crossed the rollover point. Without it, every rollover silently under-counts by however much the meter passed through between the previous reading and its maximum. The widget below applies both rules across ten hourly readings; the calculation column shows the swap explicitly.

Reading across a rollover

Ten hourly readings on a meter that rolls over at 999 kWh. The dashed line shows where the reading would be if the dial kept counting. Hover the chart or the table to inspect a single bucket.

The 999 kWh rollover is illustrative; a real meter wouldn't roll over at this magnitude.

TimePersistedCalculationDelta
08:00700 kWh
09:00750 kWh750 − 70050 kWh
10:00810 kWh810 − 75060 kWh
11:00890 kWh890 − 81080 kWh
12:00950 kWh950 − 89060 kWh
13:00rollover30 kWh(999 − 950) + 3079 kWh
14:0080 kWh80 − 3050 kWh
15:00140 kWh140 − 8060 kWh
16:00220 kWh220 − 14080 kWh
17:00310 kWh310 − 22090 kWh
Total consumption609 kWh

What we did at CoolPlanet

I also wrote an internal CoolPlanet wiki on this topic, mostly because every new engineer hit the same totalisation landmines. The most damaging of those landmines is the spike. One bad reading, an order of magnitude out, and the rest of the day's chart flattens around it.

Spike injector

Twenty-four hourly readings from a totalised meter, with the computed hourly consumption charted below. Inject a single bad reading at 09:00 and watch one rogue value cascade into two corrupt deltas.

The 503,200 kWh spike is illustrative; the cascade pattern is what matters, not the magnitude.

TimePersistedCalculationDeltaShifted
00:0050,000 kWh25 kWh
01:0050,025 kWh50,025 − 50,00025 kWh22 kWh
02:0050,047 kWh50,047 − 50,02522 kWh20 kWh
03:0050,067 kWh50,067 − 50,04720 kWh20 kWh
04:0050,087 kWh50,087 − 50,06720 kWh25 kWh
05:0050,112 kWh50,112 − 50,08725 kWh35 kWh
06:0050,147 kWh50,147 − 50,11235 kWh50 kWh
07:0050,197 kWh50,197 − 50,14750 kWh75 kWh
08:0050,272 kWh50,272 − 50,19775 kWh95 kWh
09:0050,367 kWh50,367 − 50,27295 kWh110 kWh
10:0050,477 kWh50,477 − 50,367110 kWh120 kWh
11:0050,597 kWh50,597 − 50,477120 kWh130 kWh
12:0050,727 kWh50,727 − 50,597130 kWh125 kWh
13:0050,852 kWh50,852 − 50,727125 kWh120 kWh
14:0050,972 kWh50,972 − 50,852120 kWh115 kWh
15:0051,087 kWh51,087 − 50,972115 kWh100 kWh
16:0051,187 kWh51,187 − 51,087100 kWh85 kWh
17:0051,272 kWh51,272 − 51,18785 kWh70 kWh
18:0051,342 kWh51,342 − 51,27270 kWh60 kWh
19:0051,402 kWh51,402 − 51,34260 kWh50 kWh
20:0051,452 kWh51,452 − 51,40250 kWh40 kWh
21:0051,492 kWh51,492 − 51,45240 kWh35 kWh
22:0051,527 kWh51,527 − 51,49235 kWh30 kWh
23:0051,557 kWh51,557 − 51,52730 kWh

Click the button above to swap the 09:00 reading for one that's an order of magnitude out.

We weren't catching a 100 kW boiler registering 101 kW, that's just measurement. We were catching the same meter suddenly registering 10,000 kW, the kind of value that's a glitch upstream rather than a real measurement. Spikes were just the start. Over time we built up a library of patterns we'd seen in the wild: stuck readings, negative consumption, unit slips. Each had a signature, and each got a detector and either a correction or a filter applied automatically on read.

Our time-series database, Skyspark, exposes hooks between the persisted history and the read pipeline. We placed the data-quality library at those hooks. It runs on every history read, so performance had no slack: we dropped from Axon to Fantom for the hot path and painstakingly justified every line. The result is two coexisting views of the same data: raw on disk for incident debugging, cleansed for the rest of the platform.