About
This article is about the character representation and manipulation in Javascript (ie code point).
They:
- are all unicode UTF-16 character
- are an element in a string starting at the index 0.
- may have a length of two (for the unicode character above 16bit code unit)
You don't need Javascript to show unicode character on HTML. See: HTML - How to show an Unicode Character in HTML
Creation
from Literal
From a literal
- A character is just a string with one character
let char = 'a';
console.log(`The character a: ${char}`);
- Example with an unicode hexadecimal notation High Five character
console.log('\u270B');
- For character above 16 bit such as grining face (1F600), you need to use a surrogate pair (\uD83D\uDE00)
console.log('\uD83D\uDE00');
from String
let foo = "foo \u270B";
let character = foo.charAt(foo.length-1);
console.log(character);
from Code Point (number)
From a code point (ie the index of the character in the character set).
Example with the High Five character
let hexa = '270B';
let codePoint = parseInt(hexa, 16);
let character = String.fromCodePoint(codePoint);
console.log(`The character with the code point (${codePoint}) is ${character}`);
Length
For one character, you may get a length of one or two in Javascript.
Why ? Because :
- Javascript uses UTF-16 as encoding character set, all unicode character with an index above 16bit (ie 65535) cannot be represented directly. They uses therefore two code point known as a surrogate pair. See Unicode - Surrogate pair (UTF-16)
- the javascript length returns the number of code point, you may get a value of two for one character.
Example with the grining face (1F600)
console.log('😀'.length);
Other example of character encoded with two code point