Picture of the author

A Scalable Approach to Unit Conversion

Units

There is more to a unit than the string used for its symbol. However, for simple systems that only require a handful of units, a sophisticated system with full understanding of units is probably overkill.

For example, if we only have meters (m) and kilometers (km) to deal with, we can hardcode the metadata and the conversion logic relatively easily. The more units we have, the more difficult this approach becomes to maintain. What if we add centimeters (cm) and millimeters (mm)? What about imperial units such at feet (ft) or yards (yd)? What if it's not a unit of length at all, such as kilowatts (kW)?

If we want our unit system to scale (pun intended), we need a more sophisticated solution.

But first, a little bit of theory...

Dimensions

There are seven SI base units:

Base QuantityUnit NameUnit Symbol
Timeseconds
Lengthmeterm
Masskilogramkg
Electric CurrentampereA
TemperaturekelvinK
Amount of Substancemolemol
Luminous Intensitycandelacd

A unit’s dimension defines its ratio of the seven base units. The table below gives a number of examples of units and their dimensions:

UnitsmkgAKmolcd
Celsius (°C)0000100
Fahrenheit (°F)0000100
Joule (J)-2210000
Pascal (Pa)2-110000
Lux (lx)0-200001

Notice that Celsius and Fahrenheit have the same dimensions (both units of temperature).

Some units do not fit this model (such as %) and are represented as dimensionless. In these cases, the dimensions are all set to zero.

Unit Conversion

If units have the same dimensions, it is possible to convert from one to the other using scales and offsets. Most units I've come across only need to be scaled but some, such as temperature, require an offset too.

A generic formula is needed to handle all cases. Let's start with a simple expression that we know to be true:

1m=100cm1m = 100cm

Unfortunately, our system won't be able to evaluate this due to the unit strings (if it can great, you probably don't need to read this page). If we strip out the unit strings, the equation is no longer valid:

11001 \neq 100

This is where scales and offsets come in. If we represent meters as A and centimeters as B, we can use the numeric values (n), scales (s) and offsets (o) to re-balance our equation as follows:

AnAs+Ao=BnBs+BoAn * As + Ao = Bn * Bs + Bo 11+0=1000.01+01 * 1 + 0 = 100 * 0.01 + 0

You're probably wondering where I got the scales and offsets from. For a given set of dimensions, a base unit must be selected and assigned a scale and offset of 1 and 0 respectively. Other units with the same dimensions are then assigned scales and offsets relative to the base unit. In the example above, the base unit is meter.

A little algebra gives us a formula that we can use to compute the numeric value in one unit, given the numeric value in another unit and the scales and offsets:

An=BnBs+BoAoAsAn = \frac{Bn * Bs + Bo - Ao}{As}

As mentioned above, temperature units require both scales and offsets for conversion so let's use them as a more complete example:

UnitScaleOffset
Kelvin (K)10
Celsius (°C)1273.15
Fahrenheit (°F)0.555555556255.37222223

Let's say we want to convert 20°C to fahrenheit:

=201+273.15255.3720.556= \frac{20 * 1 + 273.15 - 255.372}{0.556} =67.946= 67.946

This gives us the correct answer, 20°C is about 68°F. Our equation works!

Unit Database

The only thing left to do is the small task of enumerating all of the units and defining their dimensions and relative scales and offsets.

Thankfully, the good people of Project Haystack maintain a database of units. Their unit system is where I learnt about the above unit conversion methodology.

A simple txt file is used for the unit database. A row in the text file defines a unit if it follows the following format:

<name>, <alias>, <symbol>; <dimension>; <scale>; <offset>

In this database, each unit can have one or more unique identifiers.

The <symbol> is the abbreviated version of the unit. All units have a symbol.

The <name> is a descriptive summary of the unit using words separated by underscores such as kilograms_per_second. Sometimes, the name is the same as the symbol, in which case the unit only has a symbol.

If there are multiple symbols for a unit, they can be each be included as an <alias>. The one assigned to the symbol should be the default.

SymbolNameAliases
day--
kg/skilograms_per_second-
square_meter-
hhourhr

The <dimension> specifies the dimensions of the unit in the form of SI units and their powers. The dimensions of the example units above are shown below:

UnitDimensions
Celsius (°C)K1
Fahrenheit (°F)K1
Joule (J)kg1*m2*sec-2
Pascal (Pa)kg1*m-1*sec-2
Lux (lx)m-2*cd1

The <scale> and <offset> are the numeric values for the scale and offset respectively. If omitted, a default of 1 and 0 is used.

For the temperature units above, their definition within the txt file is as follows:

fahrenheit, °F; K1; 0.5555555555555556; 255.37222222222223
celsius, °C; K1; 1.0; 273.15
kelvin, K; K1

At the time of writing, this was the units.txt file.

Conclusion

With the conversion formula above and project haystack's unit database, it's fairly straightforward to build your own unit converter!