Datawarehouse Data Modeling - Data Vault

1 - About

The data vault (DV) is a relational data model technique used in the persistence layer of a datawarehouse

The Data Vault consists of three primary entity types:

  • Hubs: They are core business keys,
  • Links: They form all relationships,
  • and Satellites: They provide all descriptive information
The data vault is a detailed oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It's an hybrid approach encompassing the best of breed between 3rd normal form (3nf) and star schema. The design is flexible, scalable and adaptable of the needs of enterprise.

3 - Naming Convention

Naming convention allow management, identification and monitoring of the entire system.

3.1 - Table

Suffix Description
HUB or H HUB tables
LNK or L Standard Link Tables
HLNK or HL Hierarchical Links
LSA, SAL, SLNK, SL Same as Links
TLNK, TL Transactional Links
SAT. or S Generic Satellites
HSAT Hub Satellites
LSAT Link Satellites
PIT, or P Point-in-time tables
BR or B Bridge tables
REF or R Reference tables

3.2 - Column

The naming convention for the standardized fields is:

Suffix Description
LDTS, LDT Load date time stamps
LEDTS, LEDT Load end date time stamps
SQN Sequence numbers
REC_SRC, RSRC Record sources
LSD, LSDT Last seen dates
SSQN Sub-sequencing identifiers

4 - Performance Note

A full Data Vault loading cycle might take 40 or 50 minutes total time, (50 staging tables, 1 terabyte incomming data) – where a so-called “traditional” (type 2 dimension + facts) loading cycle for the same data sets would take about 3 to 4 hours…. It can all happen in PARALLEL…

5 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap