The grain is the combination of all lower hierarchy level of the descriptif attribute of a relation (table or query).
- If you query only a table, the unique key of this table is the grain
- If you query only a dimension, the lower hierarchy level identifier is the grain
At the top level, there are two main options in choosing the level of granularity:
- Unsummarized/Atomic (transaction level granularity): this is the highest level of granularity where each fact table row corresponds to a single transaction or line item
- Summarized: transactions may be summarized by a subset of dimensions or dimensional attributes. In this case, each row in the fact table corresponds to multiple transactions
The most granular or atomic data (atomic as an indivisible unit of work) has the most dimensionality. Atomic data is highly dimensional. Preferably, you should develop dimensional models for the most atomic information captured by a Event (Timed Measure|Action). Atomic data is the most detailled information collected: such data cannot be subdivided further.
Example: If a high grain is the month whereas a low or detail grain can be the day
A data warehouse almost always demands data to the lowest possible grain of each dimension not because queries want to see individual low level rows but because queries need to cut through the details in very precise ways.
The lower the level of granularity (or conversely, the higher the level of summarization), the less storage space required and the faster queries will be executed.
Example of grain
- An individual line item on a customer's retail sales ticket as measured by a scanner device
- A line item on a bill received from a doctor
- An individual boarding pass to get on a flight
- A daily snapshot of the inventory levels for each product in a warehouse
- A monthly snapshot for each bank account