Relational Data Modeling - Normal Forms

1 - Type of Normal Form

A relational database is :

  • in First Normal Form (1NF) if each attribute is single-valued with atomic values.
  • in Second Normal Form (2NF) if it is in 1NF and each attribute that is not a primary key is fully functionally dependent on the entity's primary key.
  • in Third Normal Form (3NF) if it is in 2NF and no non-primary key attribute is functionally dependent on another non-primary key.
Higher order normal forms are usually more desirable than lower order normal forms.

3 - Third Normal Form (3NF)

3.1 - 3NF for large data warehouse

3NF schemas are typically chosen for large data warehouses, especially environments with significant data-loading requirements that are used to feed data marts and execute long-running queries.

The main advantages of 3NF schemas are that they:

  • Provide a neutral schema design, independent of any application or data-usage considerations
  • May require less data-transformation than denormalized schemas such as star schemas

3.2 - Optimizing Third Normal Form Queries

Queries on 3NF schemas are often very complex and involve a large number of tables. The performance of joins between large tables is thus a primary consideration when using 3NF schemas.

One particularly important feature for 3NF schemas is partition-wise joins. The largest tables in a 3NF schema should be partitioned to enable partition-wise joins. The most common partitioning technique in these environments is composite range-hash partitioning for the largest tables, with the most-common join key chosen as the hash-partitioning key.

Optimizing a 3NF schema requires the three Ps:

4 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap