A dimension is a part of a dimensional schema and provide the basis for analyzing data through slicing and dicing
A dimension is just a set of descriptif attribute that are generally grouped:
- in a relational relationship.
- in a set of key pair value.
The below article focus on the relational side of dimension but the concept are the same.
Dimensions have descriptif attribute to filter/group on whereas measures are attribute to aggregate over
Dimensions are hierarchical, not flat. Dimensions consists of one or more hierarchies included two (Total and Details) or more levels
They help the business to see the data from different perspectives. (Perspectives is another word for dimension)
Dimension tables tend to be :
- relatively shallow in terms of number of row (often far fewer than 1 million rows)
- but are wide with many large columns.
Each dimension is defined by its primary key (PK) which serves for the referential integrity with any given fact table to which is joined.
The seven plus or minus two principle could be used to define a limit on the number of dimensions in a star schema, to ensure that it represents a conceptually manageable “chunk” of data. Kimball’s rule of thumb is that less than 15 dimensions.
Performance or resource issues are NEVER valid reasons to use a dimension or not, use of a dimension must be based on the business purpose and use of the field. For example, there may be a set of business rules that dictate how a date should be presented in reports. Providing text fields in a date dimension formatted to the proper representation greatly simplifies enforcing (as well as changing) such rules.
Dimensions are categorizations of data, and the categorizations reflect how a business analyst wants to analyze data. When analysts say they want to see numbers “by” something or “over” something, they are identifying the dimensions of the data.
How to find a dimension
Dimension fall out the question :
How do businesspeople describe the data that results from the business process ?
Dimensions typically answer who, what, when, where, how, and why questions about the business events stored in the fact table.
|Question||Example of Dimension||Type of dimension|
|Who||Customer, Employees||Third party|
To verify that you have chosen the good dimensions, just make a Cartesian join on the dimension business keys. All combinations must be meaningful to the business. If one dimension is constrained by an other, you will have a good chance that they are in real one dimension.
When the dimensional schemas use exactly the same dimension tables, these table are called “conforming” dimensions (Kimball, 1996, 2002).
This ensures that reports can:
- “drill across” from one star to an another.
- join/choose measures from different stars (from different level of granularity of from different business event)
Conforming dimensions means establishing common dimensional attributes across separated fact tables so that “drill across” reports can be generated using these attributes.
The primary key and unique key of each dimension should be a simple single attribute.
If a unique key contain more than one attribute, it means generally that it's time related and must be in this cas incorporated to the fact table.
In most cases, The primary key and unique key will just be the key of the underlying component entity.
In the case of slowly changing dimensions, a primary key is generally used to create represent the state of each entity over time.
Dimension tables are formed by denormalizing reference/dimension table and hierarchies table together. The resulting dimension table consists of the union of all attributes.
This process introduces redundancy in the form of transitive dependencies, which are violations to third normal form (3NF) (Codd, 1970). This means that the resulting dimension table is in second normal form (2NF).
Example of common dimension :
Date and time are almost completely independent. If we combined the two dimensions, the date dimension would growth sigificantly ( 5256000 rows = 10 years * 365 days * 1440 minutes ). Obviously, it's preferable to create a 3650 row date dimension and a separate 1440 row time-of-day by minute dimension.
There are more than 31 million seconds in a year and you will end up with a 86,400 records if the time grain is seconds.
The product dimension describes every product unit.
Many of the attributes in the product dimension table are not part of the merchandise hierarchy. The package attribute for example might have values such as Bottle, Bag, Box or others.
A reasonable product dimension table would have 50 or more descriptive attributes.
The product dimension is one of the two or three primary dimensions in nearly every data mart. Great care should be taken to fill this dimension with as many descriptive attributes as possible.
The store dimension is a geographic dimension. Each store can be thought of as a location. Because of this, we can roll stores up to any geographic attribute, such as ZIP code, county and state. See the spatial section, to know how to render the geographic data.
This dimension is often called as causal dimension (as opposed to a casual dimension) because it describes factors thought to cause a change in product sales.
Regular promotional events usually are not handled in the data table but rather by means of a promotion dimension, especially since promotional events are not defined solely by date but usually are defined by a combination of date, product and store.
- Causal dimension (as opposed to a casual dimension) because it describes factors thought to cause a change in product sales (Example : Promotional Dimension)
- Casual dimension
- Geographic dimension (Example : Store Dimension)
- Junk dimensions (known also as mystery or garbage dimension): dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table.
- Mini Dimension: Assuming we have a customer dimension with millions of rows, we need to use minidimensions to track customer attribute changes because a type 2 slowly changing dimension technique isn’t effective due to the large number of additional rows required to support all the change. The mini-dimension technique uses a separate dimension(s) for the attributes that frequently change.
- Role-Playing Dimensions. Dimensions are often recycled for multiple applications within the same database. For instance, a “Date” dimension can be used for “Date of Sale”, as well as “Date of Delivery”, or “Date of Hire”. This is often referred to as a “role-playing dimension”.
Dimension table space requirements pale in comparison with fact table space considerations.
Why I have to define besides a denormalized dimension table a dimension object in the database ?
Because the evaluation of the dimension information such as :
- unique value (which fall out from the first level)
- 1:n relationships (which fall out from the hierarchy)
plays an important role for the rewrite process using summary table. (see the query rewrite capability on Materialized view for several example : Oracle Database - Materialized view (Snapshot))