An Entity in html is a string that represents a unicode character.
In other words, an entity is a fully qualified notation that represents any unicode character.
Encoding text in HTML means to transform:
Example with the phone. This character has the unicode value:
Example:
To show a phone in a HTML document, you can write the following entities notation:
<ul>
<li>☎ (hexadecimal)</li>
<li>☎ (decimal)</li>
<li>☎ (name)</li>
</ul>
This example shows you that you can also write any simple character (ie from the alphabet) also in entity.
Example with the letter A. This character has the unicode value:
Example:
Therefore, to show the letter A in a HTML document, you can write the following entities notation:
<ul>
<li>A (hexadecimal)</li>
<li>A (decimal)</li>
<li>A (the letter A)</li>
</ul>
They are used to encode reserved XML/HTML character that are used in the value of an attribute.
For instance, the start < and end character > of an element tag cannot be used directly. They need to be replaced (ie encoded) in entity notation.
For instance, the character > would be replaced by the following entity >
They are also used to show complex / special characters that are not easily accessible from the keyboard.
The entity notation supports three definitions for a character:
&name; <!-- name notation -->
&#dddd; <!-- decimal notation -->
&#xhhhh; <!-- hexadecimal notation -->
where:
All entities may not be supported by old browsers but support in recent browsers is good.
This list is non-exhaustive, see the named character reference for all name
Character Description | Entity Name | Decimal | Hex | Rendering in Your Browser | ||
---|---|---|---|---|---|---|
Entity (Name) | Unicode Decimal | Unicode Hex | ||||
quotation mark = APL quote | " | " | " | “ | ” | “ |
ampersand | & | & | & | & | & | & |
less-than sign | < | < | < | < | < | < |
greater-than sign | > | > | > | > | > | > |
Latin capital ligature OE | Œ | Œ | Œ | Œ | Œ | Œ |
Latin small ligature oe | œ | œ | œ | œ | œ | œ |
Latin capital letter S with caron | Š | Š | Š | Š | Š | Š |
Latin small letter s with caron | š | š | š | š | š | š |
Latin capital letter Y with diaeresis | Ÿ | Ÿ | Ÿ | Ÿ | Ÿ | Ÿ |
modifier letter circumflex accent | ˆ | ˆ | ˆ | ˆ | ˆ | ˆ |
small tilde | ˜ | ˜ | ˜ | ˜ | ˜ | ˜ |
en space |   |   |   | |||
em space |   |   |   | |||
thin space |   |   |   | |||
zero width non-joiner | ‌ | ‌ | ‌ | | | |
zero width joiner | ‍ | ‍ | ‍ | | | |
left-to-right mark | ‎ | ‎ | ‎ | | | |
right-to-left mark | ‏ | ‏ | ‏ | | | |
en dash | – | – | – | – | – | – |
em dash | — | — | — | — | — | — |
left single quotation mark | ‘ | ‘ | ‘ | ‘ | ‘ | ‘ |
right single quotation mark | ’ | ’ | ’ | ’ | ’ | ’ |
single low-9 quotation mark | ‚ | ‚ | ‚ | ‚ | ‚ | ‚ |
left double quotation mark | “ | “ | “ | “ | “ | “ |
right double quotation mark | ” | ” | ” | ” | ” | ” |
double low-9 quotation mark | „ | „ | „ | „ | „ | „ |
dagger | † | † | † | † | † | † |
double dagger | ‡ | ‡ | ‡ | ‡ | ‡ | ‡ |
per mille sign | ‰ | ‰ | ‰ | ‰ | ‰ | ‰ |
single left-pointing angle quotation mark | ‹ | ‹ | ‹ | ‹ | ‹ | ‹ |
single right-pointing angle quotation mark | › | › | › | › | › | › |
euro sign | € | € | € | € | € | € |
Glyphs of the characters are available at the Unicode Consortium and should be already available in every browser.
function toEntities(text) {
let entities = [];
for (let i=0;i<text.length;i++) {
let entity = `&#${text[i].charCodeAt()};`
entities.push(entity);
}
return entities.join('');
}
let reservedCharacters= `"><`;
let entities = toEntities(reservedCharacters);
console.log(`The reserved characters (${reservedCharacters}) in entities format is (${entities})`);
let anchorHTML = `<a href="#" title="${entities}">Anchor with entities</a> Keep your mouse on the link to see the title tooltip.`;
document.body.insertAdjacentHTML('afterbegin', anchorHTML);
When decoding your function should take into account the three format (name, decimal and hexadecimal)
The below javascript function shows an example for the decimal form that just uses a basic regular expression replace function
function decodeDecimalEntity(text) {
return text.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
}
console.log(decodeDecimalEntity('>'));
Library have already the encode/decode functions and may add extra functionalities
Library may also implement a mapping between a ascii sequence of characters to an entity.
This mapping in a font is called a ligature.
For instance:
List: