Relational Data Modeling - Unique (Constraint|Key)

1 - About

Unique key implementation in a relational environment

A unique constraint is a constraint that enforces the Entity Integrity e.g. that every value in a column or set of columns (key) be unique—that is, no two rows of a table have duplicate values in a specified column or set of columns.

A unique constraint is also known as a:

  • a unique key
  • a unique key constraint
  • a unique integrity constraint
Unique key is often incorrectly used as a synonym for the terms UNIQUE key constraint or UNIQUE index. However, note that key refers only to the column or set of columns used in the definition of the integrity constraint.

A unique constraint allow the input of nulls unless you also define NOT NULL constraints for the same columns.

The unique key is not the access key as it can change over time. If you need to retrieve a record in a table, the primary key must be used.

3 - Type

3.1 - Simple

A simple unique key is a unique key that consists of one column.

3.2 - Composite

A composite unique key is a UNIQUE key that consists of group of columns (>1)

4 - Creation

4.1 - With unique index

Database enforces generally the UNIQUE key constraint by implicitly creating a unique index on the composite unique key.

4.2 - Without unique index

Because the creation and management of a index is time consuming, for performance reason, the creation of a unique index may be disabled. The unique constraint stays to give information to the query optimizer on the data but is not enforced.

5 - Unique constraint on a large table

5.1 - The problematic

A UNIQUE constraint is typically enforced using a UNIQUE index. However, in a data warehouse whose tables can be extremely large, creating a unique index can be costly both in processing time and in disk space.

Suppose that a data warehouse contains a table sales, which includes a column sales_id. sales_id uniquely identifies a single sales transaction, and the data warehouse administrator must ensure that this column is unique within the data warehouse.

One way to create the constraint is as follows (Oracle)

UNIQUE (prod_id, cust_id, promo_id, channel_id, time_id);

By default, the unique constraint is both enabled and validated. Oracle implicitly creates a unique index on sales_id to support this constraint.

However, this index can be problematic in a data warehouse for three reasons:

  • The unique index can be very large, because the sales table can easily have millions or even billions of rows.
  • The unique index is rarely used for query execution. Most data warehousing queries do not have predicates on unique keys, so creating this index will probably not improve performance.
  • If sales is partitioned along a column other than sales_id, the unique index must be global. This can detrimentally affect all maintenance operations on the sales table.

5.2 - An alternative mechanism - DISABLE VALIDATE

A unique index is required for unique constraints to ensure that each individual row MODIFIED in the sales table satisfies the unique constraint.

For data warehousing tables, an alternative mechanism for unique constraints is illustrated in the following statement:

UNIQUE (prod_id, cust_id, promo_id, channel_id, time_id) DISABLE VALIDATE;

This statement creates a unique constraint, but, because the constraint is disabled, a unique index is not required. This approach can be advantageous for many data warehousing environments because the constraint now ensures uniqueness without the cost of a unique index.

However, there are trade-offs for the data warehouse administrator to consider with DISABLE VALIDATE constraints. Because this constraint is disabled, no DML statements that modify the unique column are permitted against the sales table. You can use one of two strategies for modifying this table in the presence of a constraint:

  • Use DDL to add data to this table (such as exchanging partitions).
  • Before modifying this table, drop the constraint. Then, make all necessary data modifications. Finally, re-create the disabled constraint. Re-creating the constraint is more efficient than re-creating an enabled constraint. However, this approach does not guarantee that data added to the sales table while the constraint has been dropped is unique.

6 - Example and Performance

With unique constraint :

SQL> set timing on;
SQL> insert into d_material_2 select * from d_material;

99384 rows created.

Elapsed: 00:00:00.31

Without unique constraint:

SQL> insert into d_material_2 select * from d_material;

99384 rows created.

Elapsed: 00:00:00.71

With unique constraint RELY DISABLE NOVALIDATE


Table altered.

Elapsed: 00:00:00.01
SQL> insert into d_material_2 select * from d_material;

99384 rows created.

Elapsed: 00:00:00.31

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap