Logical Data Modeling - (Surrogate | Substitute | Synthetic) Primary key

1 - About

A surrogate key is a substitute primary key

  • for when you don't have a real one (entity not found for instance)
  • when the key is too long for the database (example: url for a web page)

Surrogate means to choose in place of another

It's a primary key that holds a generated value.

A surrogate key exists almost always next to a business key.

3 - Example

  • An ID column populated via a sequence is known as a surrogate key.
    • of a long natural key (such as an URL)
    • of all values of the record

SELECT * FROM tbl_name
  WHERE surrogate_pk=MD5(urlValue)
  AND url=urlValue;

4 - Usage

4.1 - No key

If there is no logical unique and non-null column or set of columns

4.2 - Space

Numeric values also produce shorter values of predictable length. Storing numeric codes in place of string is more space-efficient.

4.3 - Dimensional Modeling

The use of SKs in dimensional modeling is not only simplicity of star design or performance.

You often need to populate dimensions with extra records not found in the source table, such as:

  • or “NOT APPLIES” records.

If you don't use SKs, What PK value would you use for this extra records?

How can you be sure that the values you choose won't collide with future records from the source table?

Also, taking your dimensional schemas away from possible source key changes is a good idea. Of course creating and maintaining SKs adds complexity and is extra work for ETL processes.

5 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap