Columns of numerical values can often be efficiently compressed using two approaches:
- bit packing
- and run-length encoding (RLE)
Bit packing uses the fact that small integers do not need a full 32 or 64 bits to be represented, and packs multiple values into the space normally occupied by a single value. There are multiple ways to do this.
- Daniel Lemire’s JavaFastPFOR library
Run-length encoding (RLE)
Run-length encoding turns “runs” of the same value, meaning multiple occurrences of the same value in a row, into just a pair of numbers: the value, and the number of times it is repeated.