Dimensional Data Modeling - Snowflake schema

About

A snowflake schema is a star schema with fully normalised (3NF) dimensions. It gets its name from that it has a similar shape than a snowflake.

A snowflake is a dimensional model :

in which a central fact is surrounded by a perimeter of dimensions and at least one of its dimensions keeps its dimension levels separate.
where one or more of the dimension are normalized to some extent. The difference between the type of normalization in a snowflake schema and in an E-R schema is that the snowflake normalization is based on business hierarchy attributes.

The advantage of the snowflake structure against a star schema is that it explicitly shows the hierarchical structure of each dimension rather than appearing as an unstructured collection of data items requiring then tacit knowledge on the part of the user/designer.

One reason to snowflake is to change a complex or large dimension into a more manageable set of smaller dimensions. The snowflake structure will reduce batch updates to dimensions.

Though always said to be slower than a star, some tests have revealed no difference in performance between flattened and snowflaked dimensions. In fact in some cases, the snowflake provides superior performance, such as when a wide dimension (i.e., customer) is segmented into a snowflake.

When you need to model a many-to-many relationship in a dimensional model, you fall in a snowflake schema.

Articles Related

Case where you need a snowflake schema

Consider the follwing example below. Ralph Kimball has identified it as one of three allowable cases for a snowflake.

Split the attributes with a One-to-Many RelationShip

There are 300 attributes in a customer dimension and the attributes naturally fall into three classes: base customer, customer demographic and customer financial attributes.

Many-to-Many RelationShip

Business situations :

In banking, a customer can have different accounts, and an account can belong to different customers.
In insurance, a customer (or household) can have different policies, but a policy can support multiple customers (or households). In many businesses, a parent customer can have subsidiaries.

Such business situations do not fit into the mold of a star schema, because the relationship across the data can not readily be flattened out into a 1: M without some loss of meaning. You need then to model a Many-to-Many relation ship with a third table (often called bridge table). More …

Management of change

Building a snowflake of a 3NF model does not guarantee that it's immune to business change. For example, a hierarchy might be simply represented with one table for each level of the hierarchy, but what happens if the business decide to change the number of levels in the hierarchy or for some level to be skipped for a certain department or product ? Both these changes would require a change to the physical data model and this inevitably also make retaining a “what was” reporting perspective problematic. In this case, something simple like a “Bill of material” table structure (of self join model) would avoid these structural changes.