Denormalization should not be the first choice for improving performance and should only be used for fine tuning of a database for a particular application.
How to denormalize ?
Why Denormalization in OLAP applications ?
Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for Online Analytical Processing (OLAP). Normalization tends to multiply the number of tables, as it involves splitting out functionally dependent groups of attributes into separate tables. Normalization results in databases that are optimized for update rather than retrieval.
OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter.
The expectation is that each database transaction will leave the database in a consistent state because each change can be made in a single place and so preventing insertion, deletion, and update anomalies
By contrast, databases intended for OLAP operations are primarily “read mostly” databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or “denormalized” data may facilitate business intelligence applications by avoiding join and permitting a star transformation method.
The normalized alternative to the star schema is the snowflake schema. It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial.
- Prior normalization
- Knowledge of data usage
- How often are two data items needed together
- How many rows are involved
- What is the minimum response time and frequency of an query
- Minimize the need for joins
- Reduce number of tables
- Reduce number of foreign keys
- Reduce number of indices