About
A many-to-many relationship is a relationship (an m:n mapping) between two relation A and B in which :
- A may contain a parent row for which there are many children in B
- and vice versa.
The data warehouse world does not function with normalized fact tables since you need to start doing lots of select count(distinct) resulting in facts columns not being additive.
Example
- each project may have zero, one, or more employees assigned to it and each employee may be assigned to zero, one, or more projects.
- relationship between users and groups:
- Users can belong to a group,
- and groups comprise a list of users.
- a author has contributed to many book, a book is written by many authors
- in a Sales database, each product belongs to one or more groups, and each group contains multiple products
- in banking, a customer can have different accounts, and an account can belong to different customers.
- in insurance, a customer (or household) can have different policies, but a policy can support multiple customers (or households). In many businesses, a parent customer can have subsidiaries.
When you need to model a many-to-many relationship in a Dimensional Data Modeling - Dimensional Schemas, you fall in a snowflake schema.
How to model a many-to-many relationship ?
There are a few ways of implementing many-to-many relationships :
- the Bridge Table,
- the Boolean Column,
- or the Multiple Column method
The Bridge Table Method
See the many-to-many section of the Association table (bridge,…)
Boolean Column Method
The boolean column method consists of creating a column for each possible value, like so:
Each boolean column can be a numeric (1 or 0), or character (Y or N).
Multiple Column Method
The multiple column method consists of having columns for the number match the number of choices one can make. This has its limitations since it is tightly coupled to the application; but is easily transformed. The typical data model would look like:
In this case, you could only have five sports even though you could have several dozen choices.
Boolean Method / Bridge
The Boolean Method is far superior in every category except the breakdown, and the Bridge is a close second. You have to use the Bridge method if the number of potential values in the dimension exceeds 100 columns or so. You can do it, but it looks ugly. The best of both worlds would be to merge the Boolean and Bridge column methods. Its fairly straight–forward: add the boolean columns to the bridge table as follows:
How to avoid a bridge table?
There are several approaches:
- Hide the many-to-many relationship. You can publish two versions of the schema: the full one for use by structured reporting and a handful of power users, and a version that eliminates the many-to-many relationship for use by more casual users.
- Eliminate the many-to-many relationship by collapsing/aggregating multiple rows. Add a row to the many-to-many dimension table: “Multiple rows”. The fact table can then link directly with the dimension. As with all design decisions, the IT organization cannot choose this approach without consulting with the user community. For something which has limited information value, this approach may be quite acceptable.
- Identify a single primary row (The top 1) It may be possible to identify a primary row, either based on some logic in the transaction system or by way of business rules.
- Pivot out the many-to-many dimension (see below the Boolean Column Method). If the domain of the multi-choice space is small, you can eliminate the bridge table by creating a dimension table with one column for each choice.