Dimensional Data Modeling - Degenerate Dimension of Fact dimension (i.e. event/header entity)

1 - Fact Dimension

In a dimensional model, you may found a dimension table:

  • with a cardinality (distinct value is higher than 10,000). <note tip>A high cardinality means generally that the values are different for almost each transaction. For example: order ID, order line ID, notes).</note>
  • which has a many-to-one relationship with the fact table close to a one-to-one and then grow roughly at the same rate as the fact table

These kind of table are often called:

  • fact dimensions
  • event facts
  • factless fact table
  • header facts

as they contains often attribute that are related to the event of the fact table without containing a measure.

In an OLTP environment, you may find them as header tables of transaction lines such as sales, invoice.

In this dimension table, you may find only two different data type attributes:

  • a character data type known/referenced as textual facts. Textual facts are useful for grouping together related fact table rows, such as all the rows that are related to a particular invoice number.
  • a time data type
  • but <note important>you may not find number attribute that you can aggregate (i.e. with a SUM for instance).</note> of it's already an aggregation of an other table (such as a sold for a customer).

3 - Degenerate Dimension

To avoid an expensive join between this fact dimension and the fact table, the attribute of this dimension are degenerate i.e. moved from the dimension to the fact table and can then be classified as:

  • Fact attributes
  • Degenerate attribute

For example of degenerate attribute:

  • Order Numbers,
  • Invoice Number,
  • Credit-Debit Indicator
  • etc
Do not confuse ! Degenerate dimensions can in articles referred to a fact dimension.

4 - Pros and Cons of the degeneration

4.1 - Pros

  • Avoid a expensive join between two tables with a one-to-many relationship but which is close of a one-to-one relationship
  • By degenerating a date dimension attribute, you avoid the need of a join between the event dimension and a time dimension (which enable the drill through the time hierarchy)

4.2 - Cons: Increase the size of the fact table

By moving the dimensional attribute in the fact table, you increase the need of storage and you may use a partial degeneration

5 - Partial degeneration

To overcome the increase of capacity need when you degenerate a complete fact dimension, you can made a partial degeneration of the fact dimension by choosing to degenerate only a set of attribute with a high analytic value (i.e often used in reporting).

While it is generally not considered good dimensional modeling practice to create a dimension with a potential one-to-one relationship with the fact table, in this situation it's a reasonable tradeoff. This design tradeoff works because the reference dimension should very seldom actually join back to the fact table.

6 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap