Picture of the author

An Introduction to Time Series Data

Timeseries

With time series data, the data always has two components; the timestamp and the value.

Timestamp

The timestamp includes the date, time and timezone. We need all three of these components to know the absolute moment in time when something happened (timezones are omitted from the examples below).

Having worked with customer data from sites all over the world, timezones have caused me many headaches over the years! It's outside the scope of this article to go into the reasons for this. There is a great article here about falsehoods programmers believe about time zones.

Interval

With timeseries data, the duration of time between consecutive readings can be important. This is referred to as the interval.

5min Interval1hr Interval1day Interval
2019-01-01 00:002019-01-01 00:002019-01-01 00:00
2019-01-01 00:052019-01-01 01:002019-01-02 00:00
2019-01-01 00:102019-01-01 02:002019-01-03 00:00

Value

While the timestamp is common for all data sources, the value type differs depending on the source.

Common values types are as follows:

Numeric

Numeric time series data is probably the most common format. In this case, the value component of the data is a number. The numeric value may also have an associated unit. An example of numeric time series data would be the temperature of a room:

timestampRoom Temperature
2019-01-01 00:0020 °C
2019-01-01 00:0521 °C
2019-01-01 00:1019 °C

With numeric time series data, is it possible to fold the data. For example, the average temperature of the above data is 20 °C and the maximum temperature is 21 °C.

Boolean

With Boolean time series data, the value component of the data is either TRUE or FALSE. An example of boolean time series data would be whether a valve is open or not:

timestampValve Open
2019-01-01 00:00TRUE
2019-01-01 00:05TRUE
2019-01-01 00:10FALSE

It is not possible to fold boolean time series data using the numeric fold functions. For example, it is not possible to get the maximum value of the above data.

Often, boolean time series data is provided in a “Change of Value” format or “COV” for short. COV data can be compressed to only record values when they are different to the previous value. The valve open example could be written in Boolean COV format as follows:

timestampValve Open
2019-01-01 00:00TRUE
2019-01-01 00:10FALSE

Notice that the second timestamp with a value of TRUE is omitted. This is because the value didn’t change from the previous timestamp so there’s no need to include it.

Period

With Period time series data, the value component of the data is a period of time. Technically it is a numeric series, but the value must be a number with a time unit. The value represents the period of time when some condition was true and the timestamp represents the start time. An example period grid would be the length of time a fan was running:

timestampFan Run Time
2019-01-01 00:301.5 hr
2019-01-01 07:1030 min
2019-01-01 14:503 hr

String

With String time series data, the value component is a string. An example of string time series data would be the operating mode of a machine:

timestampOperating Mode
2019-01-01 00:00"RUNNING"
2019-01-01 06:25"SHUTTING DOWN"
2019-01-01 06:40"OFF"

As with boolean data, it is not possible to use numeric fold functions on string data.