With time series data, the data always has two components; the timestamp and the value.
Timestamp
The timestamp includes the date, time and timezone. We need all three of these components to know the absolute moment in time when something happened (timezones are omitted from the examples below).
Having worked with customer data from sites all over the world, timezones have caused me many headaches over the years! It's outside the scope of this article to go into the reasons for this. There is a great article here about falsehoods programmers believe about time zones.
Interval
With timeseries data, the duration of time between consecutive readings can be important. This is referred to as the interval.
5min Interval | 1hr Interval | 1day Interval |
---|---|---|
2019-01-01 00:00 | 2019-01-01 00:00 | 2019-01-01 00:00 |
2019-01-01 00:05 | 2019-01-01 01:00 | 2019-01-02 00:00 |
2019-01-01 00:10 | 2019-01-01 02:00 | 2019-01-03 00:00 |
Value
While the timestamp is common for all data sources, the value type differs depending on the source.
Common values types are as follows:
- Numeric
- Boolean
- Period
- String
Numeric
Numeric time series data is probably the most common format. In this case, the value component of the data is a number. The numeric value may also have an associated unit. An example of numeric time series data would be the temperature of a room:
timestamp | Room Temperature |
---|---|
2019-01-01 00:00 | 20 °C |
2019-01-01 00:05 | 21 °C |
2019-01-01 00:10 | 19 °C |
With numeric time series data, is it possible to fold the data. For example, the average temperature of the above data is 20 °C and the maximum temperature is 21 °C.
Boolean
With Boolean time series data, the value component of the data is either TRUE
or FALSE
.
An example of boolean time series data would be whether a valve is open or not:
timestamp | Valve Open |
---|---|
2019-01-01 00:00 | TRUE |
2019-01-01 00:05 | TRUE |
2019-01-01 00:10 | FALSE |
It is not possible to fold boolean time series data using the numeric fold functions. For example, it is not possible to get the maximum value of the above data.
Often, boolean time series data is provided in a “Change of Value” format or “COV” for short. COV data can be compressed to only record values when they are different to the previous value. The valve open example could be written in Boolean COV format as follows:
timestamp | Valve Open |
---|---|
2019-01-01 00:00 | TRUE |
2019-01-01 00:10 | FALSE |
Notice that the second timestamp with a value of TRUE is omitted. This is because the value didn’t change from the previous timestamp so there’s no need to include it.
Period
With Period time series data, the value component of the data is a period of time. Technically it is a numeric series, but the value must be a number with a time unit. The value represents the period of time when some condition was true and the timestamp represents the start time. An example period grid would be the length of time a fan was running:
timestamp | Fan Run Time |
---|---|
2019-01-01 00:30 | 1.5 hr |
2019-01-01 07:10 | 30 min |
2019-01-01 14:50 | 3 hr |
String
With String time series data, the value component is a string. An example of string time series data would be the operating mode of a machine:
timestamp | Operating Mode |
---|---|
2019-01-01 00:00 | "RUNNING" |
2019-01-01 06:25 | "SHUTTING DOWN" |
2019-01-01 06:40 | "OFF" |
As with boolean data, it is not possible to use numeric fold functions on string data.