About
Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes.
Articles Related
Method
- run-length encoding,
- cluster coding
- and dictionary coding.
Dictionary encoding
Columns are stored as sequences of bit-coded integers.
A check for equality can then be executed on the integers; for example, during scans or join operations. This is much faster than comparing, for example, string values.
Example
- If a column is sorted, often there are repeated adjacent values
Algorithm
- https://github.com/facebook/zstd - Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.