Table of Contents

What is XML / HTML Character Entity encoding ?

About

An Entity in html is a string that represents a unicode character.

In other words, an entity is a fully qualified notation that represents any unicode character.

Encoding text in HTML means to transform:

Example

Complex Character: Phone

Example with the phone. This character has the unicode value:

Example:

To show a phone in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x0260E; (hexadecimal)</li>
  <li>&#9742; (decimal)</li>
  <li>&phone; (name)</li>
</ul>  

Simple Character: letter A

This example shows you that you can also write any simple character (ie from the alphabet) also in entity.

Example with the letter A. This character has the unicode value:

Example:

Therefore, to show the letter A in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x41; (hexadecimal)</li>
  <li>&#65; (decimal)</li>
  <li>A (the letter A)</li>
</ul>  

Usage

Reserved Word Encoding

They are used to encode reserved XML/HTML character that are used in the value of an attribute.

For instance, the start < and end character > of an element tag cannot be used directly. They need to be replaced (ie encoded) in entity notation.

For instance, the character > would be replaced by the following entity &gt;

Complex Characters

They are also used to show complex / special characters that are not easily accessible from the keyboard.

Format

The entity notation supports three definitions for a character:

&name; <!-- name notation -->
&#dddd;  <!-- decimal notation -->
&#xhhhh; <!-- hexadecimal notation -->

where:

All entities may not be supported by old browsers but support in recent browsers is good.

List

This list is non-exhaustive, see the named character reference for all name

Character Description Entity Name Decimal Hex Rendering in Your Browser
Entity (Name) Unicode Decimal Unicode Hex
quotation mark = APL quote &quot; &#34; &#x22;
ampersand &amp; &#38; &#x26; & & &
less-than sign &lt; &#60; &#x3C; < < <
greater-than sign &gt; &#62; &#x3E; > > >
Latin capital ligature OE &OElig; &#338; &#x152; Œ Œ Œ
Latin small ligature oe &oelig; &#339; &#x153; œ œ œ
Latin capital letter S with caron &Scaron; &#352; &#x160; Š Š Š
Latin small letter s with caron &scaron; &#353; &#x161; š š š
Latin capital letter Y with diaeresis &Yuml; &#376; &#x178; Ÿ Ÿ Ÿ
modifier letter circumflex accent &circ; &#710; &#x2C6; ˆ ˆ ˆ
small tilde &tilde; &#732; &#x2DC; ˜ ˜ ˜
en space &ensp; &#8194; &#x2002;
em space &emsp; &#8195; &#x2003;
thin space &thinsp; &#8201; &#x2009;
zero width non-joiner &zwnj; &#8204; &#x200C;
zero width joiner &zwj; &#8205; &#x200D;
left-to-right mark &lrm; &#8206; &#x200E;
right-to-left mark &rlm; &#8207; &#x200F;
en dash &ndash; &#8211; &#x2013;
em dash &mdash; &#8212; &#x2014;
left single quotation mark &lsquo; &#8216; &#x2018;
right single quotation mark &rsquo; &#8217; &#x2019;
single low-9 quotation mark &sbquo; &#8218; &#x201A;
left double quotation mark &ldquo; &#8220; &#x201C;
right double quotation mark &rdquo; &#8221; &#x201D;
double low-9 quotation mark &bdquo; &#8222; &#x201E;
dagger &dagger; &#8224; &#x2020;
double dagger &Dagger; &#8225; &#x2021;
per mille sign &permil; &#8240; &#x2030;
single left-pointing angle quotation mark &lsaquo; &#8249; &#x2039;
single right-pointing angle quotation mark &rsaquo; &#8250; &#x203A;
euro sign &euro; &#8364; &#x20AC;

Glyphs of the characters are available at the Unicode Consortium and should be already available in every browser.

Function / Library

Function

Encode

function toEntities(text) {
    let entities = [];
	for (let i=0;i<text.length;i++) {
	    let entity = `&#${text[i].charCodeAt()};`
	    entities.push(entity);
	}
    return entities.join('');
}
let reservedCharacters= `"><`;
let entities = toEntities(reservedCharacters);
console.log(`The reserved characters (${reservedCharacters}) in entities format is (${entities})`);
let anchorHTML = `<a href="#" title="${entities}">Anchor with entities</a> Keep your mouse on the link to see the title tooltip.`;
document.body.insertAdjacentHTML('afterbegin', anchorHTML);

Decode

When decoding your function should take into account the three format (name, decimal and hexadecimal)

The below javascript function shows an example for the decimal form that just uses a basic regular expression replace function

function decodeDecimalEntity(text) {
  return text.replace(/&#(\d+);/g, function(match, dec) {
     return String.fromCharCode(dec);
  });
}
console.log(decodeDecimalEntity('&#62;'));

Pure Library (Encode/Decode)

Library have already the encode/decode functions and may add extra functionalities

Library from Ascii to Entities

Library may also implement a mapping between a ascii sequence of characters to an entity.

This mapping in a font is called a ligature.

For instance:

List: