Relational Data Modeling - Denormalization

About

Denormalization is the process of intentionally backing away from normalization to improve performance by suppressing join and permitting the use of star transformation technique.

Denormalization should not be the first choice for improving performance and should only be used for fine tuning of a database for a particular application.

Articles Related

How to denormalize ?

See Relational Data Modeling - Denormalization Methods (Changing the Database Design to Reduce the Number of Joins)

Why Denormalization in OLAP applications ?

Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for Online Analytical Processing (OLAP). Normalization tends to multiply the number of tables, as it involves splitting out functionally dependent groups of attributes into separate tables. Normalization results in databases that are optimized for update rather than retrieval.

OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter.

The expectation is that each database transaction will leave the database in a consistent state because each change can be made in a single place and so preventing insertion, deletion, and update anomalies

By contrast, databases intended for OLAP operations are primarily “read mostly” databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or “denormalized” data may facilitate business intelligence applications by avoiding join and permitting a star transformation method.

The normalized alternative to the star schema is the snowflake schema. It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas.

Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.

Requirements

Prior normalization
Knowledge of data usage
- How often are two data items needed together
- How many rows are involved
- What is the minimum response time and frequency of an query

Benefits

Minimize the need for joins
Reduce number of tables
Reduce number of foreign keys
Reduce number of indices