There is more to a unit than the string used for its symbol. However, for simple systems that only require a handful of units, a sophisticated system with full understanding of units is probably overkill.
For example, if we only have meters (m) and kilometers (km) to deal with, we can hardcode the metadata and the conversion logic relatively easily. The more units we have, the more difficult this approach becomes to maintain. What if we add centimeters (cm) and millimeters (mm)? What about imperial units such at feet (ft) or yards (yd)? What if it's not a unit of length at all, such as kilowatts (kW)?
If we want our unit system to scale (pun intended), we need a more sophisticated solution.
But first, a little bit of theory...
Dimensions
There are seven SI base units:
Base Quantity | Unit Name | Unit Symbol |
---|---|---|
Time | second | s |
Length | meter | m |
Mass | kilogram | kg |
Electric Current | ampere | A |
Temperature | kelvin | K |
Amount of Substance | mole | mol |
Luminous Intensity | candela | cd |
A unit’s dimension defines its ratio of the seven base units. The table below gives a number of examples of units and their dimensions:
Unit | s | m | kg | A | K | mol | cd |
---|---|---|---|---|---|---|---|
Celsius (°C) | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Fahrenheit (°F) | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Joule (J) | -2 | 2 | 1 | 0 | 0 | 0 | 0 |
Pascal (Pa) | 2 | -1 | 1 | 0 | 0 | 0 | 0 |
Lux (lx) | 0 | -2 | 0 | 0 | 0 | 0 | 1 |
Notice that Celsius and Fahrenheit have the same dimensions (both units of temperature).
Some units do not fit this model (such as %) and are represented as dimensionless. In these cases, the dimensions are all set to zero.
Unit Conversion
If units have the same dimensions, it is possible to convert from one to the other using scales and offsets. Most units I've come across only need to be scaled but some, such as temperature, require an offset too.
A generic formula is needed to handle all cases. Let's start with a simple expression that we know to be true:
Unfortunately, our system won't be able to evaluate this due to the unit strings (if it can great, you probably don't need to read this page). If we strip out the unit strings, the equation is no longer valid:
This is where scales and offsets come in. If we represent meters as A and centimeters as B, we can use the numeric values (n), scales (s) and offsets (o) to re-balance our equation as follows:
You're probably wondering where I got the scales and offsets from. For a given set of dimensions, a base unit must be selected and assigned a scale and offset of 1 and 0 respectively. Other units with the same dimensions are then assigned scales and offsets relative to the base unit. In the example above, the base unit is meter.
A little algebra gives us a formula that we can use to compute the numeric value in one unit, given the numeric value in another unit and the scales and offsets:
As mentioned above, temperature units require both scales and offsets for conversion so let's use them as a more complete example:
Unit | Scale | Offset |
---|---|---|
Kelvin (K) | 1 | 0 |
Celsius (°C) | 1 | 273.15 |
Fahrenheit (°F) | 0.555555556 | 255.37222223 |
Let's say we want to convert 20°C to fahrenheit:
This gives us the correct answer, 20°C is about 68°F. Our equation works!
Unit Database
The only thing left to do is the small task of enumerating all of the units and defining their dimensions and relative scales and offsets.
Thankfully, the good people of Project Haystack maintain a database of units. Their unit system is where I learnt about the above unit conversion methodology.
A simple txt file is used for the unit database. A row in the text file defines a unit if it follows the following format:
<name>, <alias>, <symbol>; <dimension>; <scale>; <offset>
In this database, each unit can have one or more unique identifiers.
The <symbol>
is the abbreviated version of the unit. All units have a symbol.
The <name>
is a descriptive summary of the unit using words separated by underscores such as kilograms_per_second. Sometimes, the name is the same as the symbol, in which case the unit only has a symbol.
If there are multiple symbols for a unit, they can be each be included as an <alias>
. The one assigned to the symbol should be the default.
Symbol | Name | Aliases |
---|---|---|
day | - | - |
kg/s | kilograms_per_second | - |
m² | square_meter | - |
h | hour | hr |
The <dimension>
specifies the dimensions of the unit in the form of SI units and their powers. The dimensions of the example units above are shown below:
Unit | Dimensions |
---|---|
Celsius (°C) | K1 |
Fahrenheit (°F) | K1 |
Joule (J) | kg1*m2*sec-2 |
Pascal (Pa) | kg1*m-1*sec-2 |
Lux (lx) | m-2*cd1 |
The <scale>
and <offset>
are the numeric values for the scale and offset respectively. If omitted, a default of 1 and 0 is used.
For the temperature units above, their definition within the txt file is as follows:
fahrenheit, °F; K1; 0.5555555555555556; 255.37222222222223
celsius, °C; K1; 1.0; 273.15
kelvin, K; K1
At the time of writing, this was the units.txt file.
Conclusion
With the conversion formula above and project haystack's unit database, it's fairly straightforward to build your own unit converter!