Unicode - Surrogate pair (UTF-16)

1 - About

As UTF-16 (16-bit - two-byte) can only contain the range of characters from 0x0 to 0xFFFF, to represent the code point above this range (such 0x10000 to 0x10FFFF, ), a pairs of code units known as surrogates is used.

Ie a surrogate pair is two 16-bit code units.

3 - Syntax

The surrogate pair is composed of two pair known as:

  • the “high surrogate” pair - 0xD800–0xDBFF
  • the “low surrogate” pair - 0xDC00–0xDFFF

4 - Formula

4.1 - to surrogate

<MATH> H = (S - 10000_{16}) / 400_{16} + D800_{16} \\ L = (S - 10000_{16}) % 400_{16} + DC00_{16} </MATH>

4.2 - to code point

<MATH> S = (H - D800_{16}) * 400_{16} + (L - DC00_{16}) + 10000_{16} </MATH>

5 - Usage

When used in a string <MATH> \text{Surrogate Pair} = \text{(High Surrogate)}\text{(Low Surrogate)} \\ S = HL </MATH>

Example in Javascript

console.log('\uD83D\uDC69');

6 - Management

6.1 - Concatenate

Some pairs can be concatenated to obtain another code point

console.log("Emoji concatenation");
console.log('\uD83D\uDC69 + \u200D\u2764\uFE0F\u200D = \uD83D\uDC69\u200D\u2764\uFE0F\u200D');
console.log("");
console.log("Does not work for all combinations");
console.log('\uD83D\uDE00 + \u200D\u2764\uFE0F\u200D = \uD83D\uDE00\u200D\u2764\uFE0F\u200D');
console.log("");
console.log("Works also with base character");
console.log('\u0065 + \u0301 = \u0065\u0301');

6.2 - Calculator

In Javascript

6.2.1 - From Code Point to Surrogate Pair

6.2.2 - From Surrogate Pair to Code Point

7 - Documentation / Reference

7.1 - Calculator Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap