Logical Data Modeling - (Surrogate | Substitute | Synthetic) Primary key


A surrogate key is a substitute primary key

  • for when you don't have a real one (entity not found for instance)
  • when the key is too long for the database (example: url for a web page)

Surrogate means to choose in place of another

It's a primary key that holds a generated value.

A surrogate key exists almost always next to a business key.


  • An ID column populated via a sequence is known as a surrogate key.
    • of a long natural key (such as an URL)
    • of all values of the record
SELECT * FROM tbl_name
  WHERE surrogate_pk=MD5(urlValue)
  AND url=urlValue;


No key

If there is no logical unique and non-null column or set of columns


Numeric values also produce shorter values of predictable length. Storing numeric codes in place of string is more space-efficient.

Dimensional Modeling

The use of SKs in dimensional modeling is not only simplicity of star design or performance.

You often need to populate dimensions with extra records not found in the source table, such as:

  • or “NOT APPLIES” records.

If you don't use SKs, What PK value would you use for this extra records?

How can you be sure that the values you choose won't collide with future records from the source table?

Also, taking your dimensional schemas away from possible source key changes is a good idea. Of course creating and maintaining SKs adds complexity and is extra work for ETL processes.

Documentation / Reference

Powered by ComboStrap