Datawarehouse Data Modeling - Data Vault
About
The data vault (DV) is a relational data model technique used in the persistence layer of a datawarehouse
The Data Vault consists of three primary entity types:
- Hubs: They are core business keys,
- Links: They form all relationships,
- and Satellites: They provide all descriptive information
The data vault is a detailed oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.
It's an hybrid approach encompassing the best of breed between 3rd normal form (3nf) and star schema. The design is flexible, scalable and adaptable of the needs of enterprise.
Articles Related
Naming Convention
Naming convention allow management, identification and monitoring of the entire system.
Table
Suffix | Description |
---|---|
HUB or H | HUB tables |
LNK or L | Standard Link Tables |
HLNK or HL | Hierarchical Links |
LSA, SAL, SLNK, SL | Same as Links |
TLNK, TL | Transactional Links |
SAT. or S | Generic Satellites |
HSAT | Hub Satellites |
LSAT | Link Satellites |
PIT, or P | Point-in-time tables |
BR or B | Bridge tables |
REF or R | Reference tables |
Column
The naming convention for the standardized fields is:
Suffix | Description |
---|---|
LDTS, LDT | Load date time stamps |
LEDTS, LEDT | Load end date time stamps |
SQN | Sequence numbers |
REC_SRC, RSRC | Record sources |
LSD, LSDT | Last seen dates |
SSQN | Sub-sequencing identifiers |
Performance Note
A full Data Vault loading cycle might take 40 or 50 minutes total time, (50 staging tables, 1 terabyte incomming data) – where a so-called “traditional” (type 2 dimension + facts) loading cycle for the same data sets would take about 3 to 4 hours…. It can all happen in PARALLEL…