About
UUID (Universally Unique IDentifier) are generated identifier (known as surrogate) that are guaranteed to be unique and avoid then collision.
A UUID is an identifier that is unique across both :
- time (every UUID has a timestamp)
- space (where the uuid is generated)
and is therefore globally unique.
An UUID is then known as GUIDs (Globally Unique IDentifier)
Since UUIDs are unique and persistent, you can use them as URNs (The string representation of a UUID is fully compatible with the URN syntax) See the example for an URN example
Example
- version 1 with the urn prefix (the 1 in 11d0)
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
- version 4 (the 4 in 4e2b)
27701bfc-78d0-4e2b-92ca-193cea53fa30
Usage
- A UUID can be used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network
- A UUID is generally used to uniquely identify some object or entity on the Internet.
- For instance, Windows requires them when an installer install an application
- replication,
- primary keys generation outside a database.
Pro/Cons
- Pro:
- An UUID does not need an central authority to be created
- Quick - no roundtrip to a database
- Unique across all space (ie across every table, every database, every server)
- Replication: where most scenarios require a GUID columns
- A UUID is reasonably small. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.
- Cons:
- Since a UUID is a fixed size and contains a time field, it is possible for values to rollover (around A.D. 3400, depending on the specific algorithm used).
- with 16-byte, it's 4 times larger than the traditional 4-byte index value
Variant and version (Type)
All UUID version are time-based with differents variants defined by the RFC 4122. They are also called the 5 sub type of UUID:
- 1 - The time-based and mac version
- 2 - DCE Security version, with embedded POSIX UIDs.
- 3 - The name-based version that uses MD5 hashing.
- 4 - The randomly or pseudo-randomly generated
- 5 - The name-based version that uses SHA-1 hashing.
Which one you choose to use depends on the use-case.
The full qualified version of a UUID is given by:
- its variant field (a type) - The variant field contains a value which identifies the layout. It consists of a variable number of the most significant bits of octet 8 of the UUID.
- and its version (a subtype, the five versions described above) - The version field holds a value that describes the type. It is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).
On a global level, only three algorithms are used to generate this UUIDs:
Algorithm | UUID Versions |
---|---|
Unique values of 802 MAC addresses | |
pseudo-random number | 4 |
cryptographic hashing and application-provided text strings | 3, 5 |
Version 1
A version 1 UUID uses:
- the current time,
- the MAC address (as node)
Therefore when reading a version 1 UUID, you can get:
- the creation timestamp (the when)
- the computer (the where)
Usage:
- In a distributed system, these two pieces of information are used for tracing purpose.
Example version 1 (the 1 in 11d0)
f81d4fae-7dec-11d0-a765-00a0c91e6bf6
Version 2
A version 2 UUID uses:
- the current time,
- the MAC address (as node)
- a local identifier such as the user or group ID (stored in the time_low part of the time field)
Therefore when reading a version 2 UUID, you can get:
- the creation timestamp (the when)
- the computer (the where)
- the creator (the who)
By aware that this format comes with uniqueness problem
Usage:
- In a distributed system, these three pieces of information are used for tracing purpose.
Version 3
Version 3 (and version_5) are Name-based UUID. They just differs on the hash function (version 3 uses md5 while version_5 uses sha1).
They are meant for generating UUIDs from names where they are unique inside a namespace.
A Name-based UUID accept as parameters:
A Name-based UUID is deterministic (not random). For the same namespace and name, you will always get the same UUID created.
The specification defined the following predefined namespaces (Ref).
Name type | Namespace UUID |
---|---|
FQDN | 6ba7b810-9dad-11d1-80b4-00c04fd430c8 |
URL | 6ba7b811-9dad-11d1-80b4-00c04fd430c8 |
ISO object identifier (OID) | 6ba7b812-9dad-11d1-80b4-00c04fd430c8 |
X.500 DN (in DER or a text output format) | 6ba7b814-9dad-11d1-80b4-00c04fd430c8 |
Version 4
The version 4 UUID is the most used and is meant for generating UUIDs from truly-random or pseudo-random numbers. ref doc
Because it's not device scoped, it provides better privacy properties.
The chance of collision are so small that it can be ignored.
Furthermore, Google Analytics use it to generate the Anonymous Id on the mobile platform.
Except bits 6, 7 and 12 through 15, all other bits are random:
- the timestamp is a randomly or pseudo-randomly generated 60-bit value
- the clock sequence is a randomly or pseudo-randomly generated 14-bit value
- the node field is a randomly or pseudo-randomly generated 48-bit
The algorithm is explained in detail in the specification 4.4.
Example: version 4 (the 4 in 4e2b)
27701bfc-78d0-4e2b-92ca-193cea53fa30
Version 5
Version 5 is as version_3 a Name-based UUID but uses the sha1 hash. For more information, see the version_3 section.
Version 6
A time-ordered version with gregorian epoch proposed as new UUID format
Version 7
A time-ordered version with Unix epoch proposed as new UUID format.
Collision
The change of collision are so small 1) that it can be ignored.
For example, for the version-4 UUIDs, there is a 50% chance of collision when generating:
- 2.71 quintillion UUIDs
- or 1 billion UUIDs per second for about 85 years
Format / Size
For all version (Ref), the UUID:
- has a fixed size of:
- 16 octets (ie 16 byte, 128 bits)
- 36 chars (with the hexadecimal string representation)
- 32 characters (16 octets * 2 because 1 hexadecimal character = 4 bit)
- 4 minus character
- contains a time field.
String
The format of an UUID hexadecimal string representation is in BNF syntax for bit.
'UUID' := 'time-low' "-" 'time-mid' "-" 'time-high-and-version' "-" 'clock-seq-and-reserved clock-seq-low' "-" 'node'
where:
- time-low = 4hexOctet
- time-mid = 2hexOctet
- time-high-and-version = 2hexOctet
- clock-seq-and-reserved = hexOctet
- clock-seq-low = hexOctet
- node = 6hexOctet
Note that:
- hexOctet = hexDigit hexDigit
- hexDigit = “0” / “1” / “2” / “3” / “4” / “5” / “6” / “7” / “8” / “9” / “a” / “b” / “c” / “d” / “e” / “f” / “A” / “B” / “C” / “D” / “E” / “F”
Because every application use them as URN, the bit sequence is generally converted as this string representation.
Pattern
- php
const UUID4_PATTERN = "/^[0-9A-F]{8}-[0-9A-F]{4}-[4][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i";
Specification
Implementation
Javascript Code
Javascript UUID4
console.log(window.crypto.randomUUID());
- Implementation maybe something like that 3)
function uuidv4() {
return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
(c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
);
}
console.log(uuidv4());
Javascript Package
https://www.npmjs.com/package/uuid
Example:
- the string
let uuidString = uuid.v4()
console.log(`Hexadecimal String: ${uuidString}`);
- its size in character
console.log(`Hexadecimal String Character length: ${uuidString.length}`);
- its size in bit without the 4 minus
console.log(`Hexadecimal String Bit length: ${(uuidString.length -4 )* 4}`);
- the byte array (parse returns a 16 bytes array)
let uuidByteArray = uuid.parse(uuidString); // 16 bytes array
console.log(`Binary byte length: ${uuidByteArray.length}`);
console.log(`Binary bit length: ${uuidByteArray.length * 8}`);
console.log(uuidByteArray);
Windows SDK
The Windows SDK you get by installing Visual Studio also contains a GUID-generation tool, available, for example, at
%ProgramFiles%\Microsoft SDKs\Windows\v7.1A\Bin\uuidgen.exe
%ProgramFiles%\Windows Kits\8.1\bin\x86\uuidgen.exe
Database
In a database, you can store a UUID:
- in binary (smaller)
- or in text
Storing the UUID in binary will make the data unreadable for humans. This is then not really practical to write a query.
The standard hexadecimal text representation can be shortened by:
- and deleting the separator
If you still want to have a smaller id, you can always create one that suits your needs.
Query performance will not really suffer as even an index on text is using a binary representation. 4)