What is a UUID? - Universally Unique IDentifier - (also known as GUID, Globally Unique IDentifier)

About

UUID (Universally Unique IDentifier) are generated identifier (known as surrogate) that are guaranteed to be unique and avoid then collision.

A UUID is an identifier that is unique across both :

  • time (every UUID has a timestamp)
  • space (where the uuid is generated)

and is therefore globally unique.

An UUID is then known as GUIDs (Globally Unique IDentifier)

Since UUIDs are unique and persistent, you can use them as URNs (The string representation of a UUID is fully compatible with the URN syntax) See the example for an URN example

Example

  • version 1 with the urn prefix (the 1 in 11d0)
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

  • version 4 (the 4 in 4e2b)
27701bfc-78d0-4e2b-92ca-193cea53fa30

Usage

  • A UUID can be used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network
  • A UUID is generally used to uniquely identify some object or entity on the Internet.
  • For instance, Windows requires them when an installer install an application
  • replication,
  • primary keys generation outside a database.

Pro/Cons

  • Pro:
    • An UUID does not need an central authority to be created
    • Quick - no roundtrip to a database
    • Unique across all space (ie across every table, every database, every server)
    • Replication: where most scenarios require a GUID columns
    • A UUID is reasonably small. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.
  • Cons:
    • Since a UUID is a fixed size and contains a time field, it is possible for values to rollover (around A.D. 3400, depending on the specific algorithm used).
    • with 16-byte, it's 4 times larger than the traditional 4-byte index value

Variant and version (Type)

All UUID version are time-based with differents variants defined by the RFC 4122. They are also called the 5 sub type of UUID:

  • 1 - The time-based and mac version
  • 2 - DCE Security version, with embedded POSIX UIDs.
  • 3 - The name-based version that uses MD5 hashing.
  • 4 - The randomly or pseudo-randomly generated
  • 5 - The name-based version that uses SHA-1 hashing.

Which one you choose to use depends on the use-case.

The full qualified version of a UUID is given by:

  • its variant field (a type) - The variant field contains a value which identifies the layout. It consists of a variable number of the most significant bits of octet 8 of the UUID.
  • and its version (a subtype, the five versions described above) - The version field holds a value that describes the type. It is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field).

On a global level, only three algorithms are used to generate this UUIDs:

Algorithm UUID Versions
Unique values of 802 MAC addresses
pseudo-random number 4
cryptographic hashing and application-provided text strings 3, 5

Version 1

A version 1 UUID uses:

Therefore when reading a version 1 UUID, you can get:

  • the creation timestamp (the when)
  • the computer (the where)

Usage:

Example version 1 (the 1 in 11d0)

f81d4fae-7dec-11d0-a765-00a0c91e6bf6

Version 2

A version 2 UUID uses:

Therefore when reading a version 2 UUID, you can get:

  • the creation timestamp (the when)
  • the computer (the where)
  • the creator (the who)

By aware that this format comes with uniqueness problem

Usage:

Version 3

Version 3 (and version_5) are Name-based UUID. They just differs on the hash function (version 3 uses md5 while version_5 uses sha1).

They are meant for generating UUIDs from names where they are unique inside a namespace.

A Name-based UUID accept as parameters:

  • a namespace (a properly formatted UUID)
  • and a name (a string of any length)

A Name-based UUID is deterministic (not random). For the same namespace and name, you will always get the same UUID created.

The specification defined the following predefined namespaces (Ref).

Name type Namespace UUID
FQDN 6ba7b810-9dad-11d1-80b4-00c04fd430c8
URL 6ba7b811-9dad-11d1-80b4-00c04fd430c8
ISO object identifier (OID) 6ba7b812-9dad-11d1-80b4-00c04fd430c8
X.500 DN
(in DER or a text output format)
6ba7b814-9dad-11d1-80b4-00c04fd430c8

Version 4

The version 4 UUID is the most used and is meant for generating UUIDs from truly-random or pseudo-random numbers. ref doc

Because it's not device scoped, it provides better privacy properties.

The chance of collision are so small that it can be ignored.

Furthermore, Google Analytics use it to generate the Anonymous Id on the mobile platform.

Except bits 6, 7 and 12 through 15, all other bits are random:

  • the timestamp is a randomly or pseudo-randomly generated 60-bit value
  • the clock sequence is a randomly or pseudo-randomly generated 14-bit value
  • the node field is a randomly or pseudo-randomly generated 48-bit

The algorithm is explained in detail in the specification 4.4.

Example: version 4 (the 4 in 4e2b)

27701bfc-78d0-4e2b-92ca-193cea53fa30

Version 5

Version 5 is as version_3 a Name-based UUID but uses the sha1 hash. For more information, see the version_3 section.

Version 6

A time-ordered version with gregorian epoch proposed as new UUID format

Version 7

A time-ordered version with Unix epoch proposed as new UUID format.

Collision

The change of collision are so small 1) that it can be ignored.

For example, for the version-4 UUIDs, there is a 50% chance of collision when generating:

  • 2.71 quintillion UUIDs
  • or 1 billion UUIDs per second for about 85 years

Format / Size

For all version (Ref), the UUID:

  • has a fixed size of:
  • contains a time field.

String

The format of an UUID hexadecimal string representation is in BNF syntax for bit.

'UUID' := 'time-low' "-" 'time-mid' "-" 'time-high-and-version' "-" 'clock-seq-and-reserved clock-seq-low' "-" 'node'

where:

  • time-low = 4hexOctet
  • time-mid = 2hexOctet
  • time-high-and-version = 2hexOctet
  • clock-seq-and-reserved = hexOctet
  • clock-seq-low = hexOctet
  • node = 6hexOctet

Note that:

  • hexOctet = hexDigit hexDigit
  • hexDigit = “0” / “1” / “2” / “3” / “4” / “5” / “6” / “7” / “8” / “9” / “a” / “b” / “c” / “d” / “e” / “f” / “A” / “B” / “C” / “D” / “E” / “F”

Because every application use them as URN, the bit sequence is generally converted as this string representation.

Pattern

regular expression pattern

  • php
const UUID4_PATTERN = "/^[0-9A-F]{8}-[0-9A-F]{4}-[4][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i";

Specification

Implementation

Javascript Code

Javascript UUID4

  • The web API crypto.RandomUUID for a UUID version 4 2)
console.log(window.crypto.randomUUID());
  • Implementation maybe something like that 3)
function uuidv4() {
  return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
    (c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
  );
}

console.log(uuidv4());

Javascript Package

https://www.npmjs.com/package/uuid

Example:

  • the string
let uuidString = uuid.v4()
console.log(`Hexadecimal String: ${uuidString}`);
  • its size in character
console.log(`Hexadecimal String Character length: ${uuidString.length}`);
  • its size in bit without the 4 minus
console.log(`Hexadecimal String Bit length: ${(uuidString.length -4 )* 4}`);
let uuidByteArray = uuid.parse(uuidString); // 16 bytes array
console.log(`Binary byte length: ${uuidByteArray.length}`);
console.log(`Binary bit length: ${uuidByteArray.length * 8}`);
console.log(uuidByteArray);

Windows SDK

The Windows SDK you get by installing Visual Studio also contains a GUID-generation tool, available, for example, at

%ProgramFiles%\Microsoft SDKs\Windows\v7.1A\Bin\uuidgen.exe
%ProgramFiles%\Windows Kits\8.1\bin\x86\uuidgen.exe

Database

In a database, you can store a UUID:

  • in binary (smaller)
  • or in text

Storing the UUID in binary will make the data unreadable for humans. This is then not really practical to write a query.

The standard hexadecimal text representation can be shortened by:

  • using a bigger base (such as base64)
  • and deleting the separator

If you still want to have a smaller id, you can always create one that suits your needs.

Query performance will not really suffer as even an index on text is using a binary representation. 4)

5)





Discover More
Odds Of A Hash Collision
Collisions of Hash or Identifier Generation

A collision happens when a function produces the same result while it should be unique. In a hash function, it happens when two different inputs produces the same hash In a identifier generator, it...
Card Puncher Data Processing
Google Analytics - Client Id (cid) - Anonymous Id

ClientId is the anonymous user id in Google analytics If a cookie does not exist for the specified domain, a client ID is generated and stored in the cookie The client id is stored in the utma...
Data System Architecture
How is used the name attribute in logical data modeling?

A name is a label attribute of an primary element. It's an identifier that specifies a unique key in a specific scope called a namespace. the name of file is unique in a directory. the name of...
JSON - ObjectId

ObjectId are generated identifier (known as surrogate) with the intent to be unique for a Json. ObjectId are custom UUID that are created from: a counter timestamp (milliseconds) node id (IP...
LDAP - GUID Attribute by LDAP server

GUID in LDAP is an attribute. Provider Default GUID Attribute Name WebLogic Authentication provider orclguid Oracle Internet Directory Authentication provider orclguid Oracle Virtual Directory...
Java Conceptuel Diagram
Logging - Context (Mapped Diagnostic Context)

Context data (also known as Mapped Diagnostic Context or MDC) To be able to follow a request that flow in different component, you need to put the request ID in every message (generally a GUID. It's...
Map Of Internet 1973
Network - Fully Qualified Domain Name (FQDN)

FQDN stands for fully qualified domain name. It's a name that is said to be absolute (ie that includes the local domain and ends with a dot). This name identifies uniquely a node in the namespace....
Obi Edition
OBIEE - GUID

in OBIEE is used for: role guids group guids user guids See
Data System Architecture
Surrogate (Sequence, GUID) vs Natural key (Business Key)

What's the best choice of data for your primary key ?
Uniform Resource Name (URN)

An URN is a URI that defines unicity of a resource through the use of a name. in the context of computer, a name is just the string representation of a bit. URN is a subset of URI. A person's...



Share this page:
Follow us:
Task Runner