The data vault (DV) is a relational data model technique used in the persistence layer of a datawarehouse
The Data Vault consists of three primary entity types:
The data vault is a detailed oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.
It's an hybrid approach encompassing the best of breed between 3rd normal form (3nf) and star schema. The design is flexible, scalable and adaptable of the needs of enterprise.
Naming convention allow management, identification and monitoring of the entire system.
Suffix | Description |
---|---|
HUB or H | HUB tables |
LNK or L | Standard Link Tables |
HLNK or HL | Hierarchical Links |
LSA, SAL, SLNK, SL | Same as Links |
TLNK, TL | Transactional Links |
SAT. or S | Generic Satellites |
HSAT | Hub Satellites |
LSAT | Link Satellites |
PIT, or P | Point-in-time tables |
BR or B | Bridge tables |
REF or R | Reference tables |
The naming convention for the standardized fields is:
Suffix | Description |
---|---|
LDTS, LDT | Load date time stamps |
LEDTS, LEDT | Load end date time stamps |
SQN | Sequence numbers |
REC_SRC, RSRC | Record sources |
LSD, LSDT | Last seen dates |
SSQN | Sub-sequencing identifiers |
A full Data Vault loading cycle might take 40 or 50 minutes total time, (50 staging tables, 1 terabyte incomming data) – where a so-called “traditional” (type 2 dimension + facts) loading cycle for the same data sets would take about 3 to 4 hours…. It can all happen in PARALLEL…