Unicode - Surrogate pair (UTF-16)

1 - About

As UTF-16 (16-bit - two-byte) can only contain the range of characters from 0x0 to 0xFFFF, to represent the code point above this range (such 0x10000 to 0x10FFFF, ), a pairs of code units known as surrogates is used.

Ie a surrogate pair is two 16-bit code units.

3 - Syntax

The surrogate pair is composed of two pair known as:

  • the “high surrogate” pair - 0xD800–0xDBFF
  • the “low surrogate” pair - 0xDC00–0xDFFF

4 - Formula

4.1 - to surrogate

<MATH> H = (S - 10000_{16}) / 400_{16} + D800_{16} \\ L = (S - 10000_{16}) % 400_{16} + DC00_{16} </MATH>

4.2 - to code point

<MATH> S = (H - D800_{16}) * 400_{16} + (L - DC00_{16}) + 10000_{16} </MATH>

5 - Usage

When used in a string <MATH> \text{Surrogate Pair} = \text{(High Surrogate)}\text{(Low Surrogate)} \\ S = HL </MATH>

Example in Javascript


console.log('\uD83D\uDC69');

6 - Management

6.1 - Concatenate

Some pairs can be concatenated to obtain another code point


console.log("Emoji concatenation");
console.log('\uD83D\uDC69 + \u200D\u2764\uFE0F\u200D = \uD83D\uDC69\u200D\u2764\uFE0F\u200D');
console.log("");
console.log("Does not work for all combinations");
console.log('\uD83D\uDE00 + \u200D\u2764\uFE0F\u200D = \uD83D\uDE00\u200D\u2764\uFE0F\u200D');
console.log("");
console.log("Works also with base character");
console.log('\u0065 + \u0301 = \u0065\u0301');

6.2 - Calculator

In Javascript

6.2.1 - From Code Point to Surrogate Pair


function surrogateCalculator(S){
   H = Math.floor((S - 0x10000) / 0x400) + 0xD800;
   L = ((S - 0x10000) % 0x400) + 0xDC00;
   return [H,L];
}

function printSurrogatePair(codePoint){
   let [H,L] = surrogateCalculator(codePoint);
   let outputText = 'The UTF16 surrogate pair for '+codePoint+' is \\u'+H.toString(16).toUpperCase()+'\\u'+L.toString(16).toUpperCase()+' ('+String.fromCodePoint(codePoint)+')';
   console.log(outputText);
}


<form onSubmit="printSurrogatePair(this.codePoint.value); return false;">
<p><label>Enter the code point:</label></p>
<p><input type="text" name="codePoint" value="0x1F600"></input></p>
<p><button type="submit">Show the surrogate pair</button></p>
</form>

6.2.2 - From Surrogate Pair to Code Point


function codePointCalculator(H,L){
   return S = ((H - 0xD800) * 0x400) + (L - 0xDC00) + 0x10000;
}

function printCodePoint(H,L){
   let S = codePointCalculator(H,L);
   let outputText = 'The UTF16 code point for the surrogate pair \\u'+H.toString(16).toUpperCase()+'\\u'+L.toString(16).toUpperCase()+" is "+S.toString(16).toUpperCase()+' ('+String.fromCodePoint(S)+')';
   console.log(outputText);
}


<form onSubmit="printCodePoint(this.H.value,this.L.value); return false;">
<p><label>Enter the high surrogate:</label></p>
<p><input type="text" name="H" value="0xD83D"></input></p>
<p><label>Enter the low surrogate:</label></p>
<p><input type="text" name="L" value="0xDE00"></input></p>
<p><button type="submit">Show the code point</button></p>
</form>

7 - Documentation / Reference

7.1 - Calculator Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap