What is a UUID? - Universally Unique IDentifier - (also known as GUID, Globally Unique IDentifier)

About

UUID (Universally Unique IDentifier) are generated identifier (known as surrogate) that are guaranteed to be unique and avoid then collision.

A UUID is an identifier that is unique across both :

time (every UUID has a timestamp)
space (where the uuid is generated)

and is therefore globally unique.

An UUID is then known as GUIDs (Globally Unique IDentifier)

Since UUIDs are unique and persistent, you can use them as URNs (The string representation of a UUID is fully compatible with the URN syntax) See the example for an URN example

Example

version 1 with the urn prefix (the 1 in 11d0)

urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

version 4 (the 4 in 4e2b)

27701bfc-78d0-4e2b-92ca-193cea53fa30

Usage

A UUID can be used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network
A UUID is generally used to uniquely identify some object or entity on the Internet.
For instance, Windows requires them when an installer install an application
replication,
primary keys generation outside a database.

Pro/Cons

Pro:
- An UUID does not need an central authority to be created
- Quick - no roundtrip to a database
- Unique across all space (ie across every table, every database, every server)
- Replication: where most scenarios require a GUID columns
- A UUID is reasonably small. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.
Cons:
- Since a UUID is a fixed size and contains a time field, it is possible for values to rollover (around A.D. 3400, depending on the specific algorithm used).
- with 16-byte, it's 4 times larger than the traditional 4-byte index value

Variant and version (Type)

All UUID version are time-based with differents variants defined by the RFC 4122. They are also called the 5 sub type of UUID:

1 - The time-based and mac version
2 - DCE Security version, with embedded POSIX UIDs.
3 - The name-based version that uses MD5 hashing.
4 - The randomly or pseudo-randomly generated
5 - The name-based version that uses SHA-1 hashing.

Which one you choose to use depends on the use-case.

The full qualified version of a UUID is given by:

its variant field (a type) - The variant field contains a value which identifies the layout. It consists of a variable number of the most significant bits of octet 8 of the UUID.
and its version (a subtype, the five versions described above) - The version field holds a value that describes the type. It is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).

On a global level, only three algorithms are used to generate this UUIDs:

Algorithm	UUID Versions
Unique values of 802 MAC addresses
pseudo-random number	4
cryptographic hashing and application-provided text strings	3, 5

Version 1

A version 1 UUID uses:

the current time,
the MAC address (as node)

Therefore when reading a version 1 UUID, you can get:

the creation timestamp (the when)
the computer (the where)

Usage:

In a distributed system, these two pieces of information are used for tracing purpose.

Example version 1 (the 1 in 11d0)

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

Version 2

A version 2 UUID uses:

the current time,
the MAC address (as node)
a local identifier such as the user or group ID (stored in the time_low part of the time field)

Therefore when reading a version 2 UUID, you can get:

the creation timestamp (the when)
the computer (the where)
the creator (the who)

By aware that this format comes with uniqueness problem

Usage:

In a distributed system, these three pieces of information are used for tracing purpose.

Version 3

Version 3 (and version_5) are Name-based UUID. They just differs on the hash function (version 3 uses md5 while version_5 uses sha1).

They are meant for generating UUIDs from names where they are unique inside a namespace.

A Name-based UUID accept as parameters:

a namespace (a properly formatted UUID)
and a name (a string of any length)

A Name-based UUID is deterministic (not random). For the same namespace and name, you will always get the same UUID created.

The specification defined the following predefined namespaces (Ref).

Name type	Namespace UUID
FQDN	6ba7b810-9dad-11d1-80b4-00c04fd430c8
URL	6ba7b811-9dad-11d1-80b4-00c04fd430c8
ISO object identifier (OID)	6ba7b812-9dad-11d1-80b4-00c04fd430c8
X.500 DN (in DER or a text output format)	6ba7b814-9dad-11d1-80b4-00c04fd430c8

Version 4

The version 4 UUID is the most used and is meant for generating UUIDs from truly-random or pseudo-random numbers. ref doc

Because it's not device scoped, it provides better privacy properties.

The chance of collision are so small that it can be ignored.

Furthermore, Google Analytics use it to generate the Anonymous Id on the mobile platform.

Except bits 6, 7 and 12 through 15, all other bits are random:

the timestamp is a randomly or pseudo-randomly generated 60-bit value
the clock sequence is a randomly or pseudo-randomly generated 14-bit value
the node field is a randomly or pseudo-randomly generated 48-bit

The algorithm is explained in detail in the specification 4.4.

Example: version 4 (the 4 in 4e2b)

27701bfc-78d0-4e2b-92ca-193cea53fa30

Version 5

Version 5 is as version_3 a Name-based UUID but uses the sha1 hash. For more information, see the version_3 section.

Version 6

A time-ordered version with gregorian epoch proposed as new UUID format

Version 7

A time-ordered version with Unix epoch proposed as new UUID format.

Collision

The change of collision are so small ¹⁾ that it can be ignored.

For example, for the version-4 UUIDs, there is a 50% chance of collision when generating:

2.71 quintillion UUIDs
or 1 billion UUIDs per second for about 85 years

Format / Size

For all version (Ref), the UUID:

has a fixed size of:
- 16 octets (ie 16 byte, 128 bits)
- 36 chars (with the hexadecimal string representation)
  - 32 characters (16 octets * 2 because 1 hexadecimal character = 4 bit)
  - 4 minus character
contains a time field.

String

The format of an UUID hexadecimal string representation is in BNF syntax for bit.

'UUID' := 'time-low' "-" 'time-mid' "-" 'time-high-and-version' "-" 'clock-seq-and-reserved clock-seq-low' "-" 'node'

where:

time-low = 4hexOctet
time-mid = 2hexOctet
time-high-and-version = 2hexOctet
clock-seq-and-reserved = hexOctet
clock-seq-low = hexOctet
node = 6hexOctet

Note that:

hexOctet = hexDigit hexDigit
hexDigit = “0” / “1” / “2” / “3” / “4” / “5” / “6” / “7” / “8” / “9” / “a” / “b” / “c” / “d” / “e” / “f” / “A” / “B” / “C” / “D” / “E” / “F”

Because every application use them as URN, the bit sequence is generally converted as this string representation.

Pattern

regular expression pattern

const UUID4_PATTERN = "/^[0-9A-F]{8}-[0-9A-F]{4}-[4][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i";

Specification

Implementation

Javascript Code

Javascript UUID4

The web API crypto.RandomUUID for a UUID version 4 ²⁾

console.log(window.crypto.randomUUID());

Rendered by WebCode

Implementation maybe something like that ³⁾

function uuidv4() {
  return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
    (c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
  );
}

console.log(uuidv4());

Rendered by WebCode

Javascript Package

https://www.npmjs.com/package/uuid

Example:

the string

let uuidString = uuid.v4()
console.log(`Hexadecimal String: ${uuidString}`);

its size in character

console.log(`Hexadecimal String Character length: ${uuidString.length}`);

its size in bit without the 4 minus

console.log(`Hexadecimal String Bit length: ${(uuidString.length -4 )* 4}`);

the byte array (parse returns a 16 bytes array)

let uuidByteArray = uuid.parse(uuidString); // 16 bytes array
console.log(`Binary byte length: ${uuidByteArray.length}`);
console.log(`Binary bit length: ${uuidByteArray.length * 8}`);
console.log(uuidByteArray);

Rendered by WebCode

Windows SDK

The Windows SDK you get by installing Visual Studio also contains a GUID-generation tool, available, for example, at

%ProgramFiles%\Microsoft SDKs\Windows\v7.1A\Bin\uuidgen.exe
%ProgramFiles%\Windows Kits\8.1\bin\x86\uuidgen.exe

Database

In a database, you can store a UUID:

in binary (smaller)
or in text

Storing the UUID in binary will make the data unreadable for humans. This is then not really practical to write a query.

The standard hexadecimal text representation can be shortened by:

using a bigger base (such as base64)
and deleting the separator

If you still want to have a smaller id, you can always create one that suits your needs.

Query performance will not really suffer as even an index on text is using a binary representation. ⁴⁾

⁵⁾

¹⁾

collison in UUID

²⁾

Web/API/Crypto/randomUUID

³⁾

Stackoverflow

⁴⁾

how-to-efficient-insert-and-fetch-uuid-in-core-data

⁵⁾

global universal identifier (GUID)