Relational Data Modeling - Database Normalization

Data System Architecture

Definition

Normalization is a logical data base design method. Normalization is a process of systematically breaking a complex table into simpler ones. It is built around the concept of normal forms.

Database normalization is a data normalization process that reduce data to its canonical form.

Purposes of Normalization

In the design of a data model, normalization is the process of adjusting table and relations to:

  • eliminate certain types of data (redundancy|replication) to improve consistency,
  • produce a clearer and readable data model.
  • provide maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.
  • avoid update anomalies

An update anomaly is a problem when:

  • inserting (no place to insert new information),
  • deleting (lost of information),
  • or updating (inconsistency may occur because of the existence of data redundancy)

a database because of the structure of the relations.

Normalization: Pros and Cons

Pros

  • Reduce data redundancy & space required
  • Enhance data consistency
  • Enforce data integrity
  • Reduce update cost
  • Provide maximum flexibility in responding ad hoc queries
  • Allow the use of parallelism,
  • Can reduce the total number of rows per block.

Improve:

Cons

  • Many complex queries will be slower because joins have to be performed to retrieve relevant data from several normalized tables
  • Programmers/users have to understand the underlying data model of an database application in order to perform proper model of an database application in order to perform proper joins among several tables
  • The formulation of multiple-level queries is a non-trivial task.

Normalization Process

Normalization steps

Normalization Steps

3NF/2NF Normalization Example

An example of a 2NF table that fails to meet the requirements of 3NF is:

Tournament Winners Tournament

Tournament Year Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.

The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.

In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Tournament Winners

Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson

Player Dates of Birth

Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968

Update anomalies cannot occur in these tables, which are both in 3NF.

Reference





Discover More
Testing Infrastructure
Code - Testing (Software Quality Assurance|SQA|Validator|Checker)

A test is performed to verify that the system is conformed to the specification and is the most important part of code quality. In a “Test-driven_developmenttest-driven software development (tdd)”...
Thomas Bayes
Data Mining - Data (Preparation | Wrangling | Munging)

Data Preparation is a data step that prepares your data for further analyis. It's a key factor in any data project: mining, ai analytics Preparing has several steps that are explained below. ...
Dataquality Metrics
Data Quality - Data Rules

Data rules are rule that can have various designations such as: business rules (in the data modeling), data test, quality screen. They follow the same concept than the rules from an event driven...
Oltp Dwh
Data Warehousing - Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalization common in Online Transaction Processing (OLTP) environments....
Star Schema
Dimensional Data Modeling - Hierarchy

in dimensional data modeling A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension in the form of a tree (A tree...
Snowflakeschema
Dimensional Data Modeling - Snowflake schema

A snowflake schema is a star schema with fully normalised (3NF) dimensions. It gets its name from that it has a similar shape than a snowflake. A snowflake is a dimensional model : in which a central...
Relational Data Model
RDBMS - Relational Model

The relational model for database management: is a set of principles for relational databases formalized by Dr. E.F. Codd in the late 1960s. is a database model based on first-order predicate logic...
Data System Architecture
Relational Data Modeling

This section is data Modeling applied to a relational structure (mostly table). Relational databases normalize data, removing as much duplication as possible. A relational database’s approach to...
Data System Architecture
Relational Data Modeling - Denormalization

Denormalization is the process of intentionally backing away from normalization to improve performance by suppressing join and permitting the use of star transformation technique. Denormalization should...
Data System Architecture
Relational Data Modeling - Denormalization Methods (Changing the Database Design to Reduce the Number of Joins)

You can reduce the number of joins performed during queries by denormalizing the data within your application. In a normalized database, the attributes of a table relate directly to the fullprimary kedenormalized...



Share this page:
Follow us:
Task Runner