About
An Entity in html is a string that represents a unicode character.
In other words, an entity is a fully qualified notation that represents any unicode character.
Encoding text in HTML means to transform:
- the text characters
- into HTML entities.
Example
Complex Character: Phone
Example with the phone. This character has the unicode value:
- 0260E in hexadecimal
- ie/or 9742 in decimal
- phone as entity name.
Example:
- the following HTML
To show a phone in a HTML document, you can write the following entities notation:
<ul>
<li>☎ (hexadecimal)</li>
<li>☎ (decimal)</li>
<li>☎ (name)</li>
</ul>
- will output:
Simple Character: letter A
This example shows you that you can also write any simple character (ie from the alphabet) also in entity.
Example with the letter A. This character has the unicode value:
- 41 in hexadecimal
- ie/or 65 in decimal
- no name
Example:
- the following HTML
Therefore, to show the letter A in a HTML document, you can write the following entities notation:
<ul>
<li>A (hexadecimal)</li>
<li>A (decimal)</li>
<li>A (the letter A)</li>
</ul>
- will output:
Usage
Reserved Word Encoding
They are used to encode reserved XML/HTML character that are used in the value of an attribute.
For instance, the start < and end character > of an element tag cannot be used directly. They need to be replaced (ie encoded) in entity notation.
For instance, the character > would be replaced by the following entity >
Complex Characters
They are also used to show complex / special characters that are not easily accessible from the keyboard.
Format
The entity notation supports three definitions for a character:
&name; <!-- name notation -->
&#dddd; <!-- decimal notation -->
&#xhhhh; <!-- hexadecimal notation -->
where:
- name is a character name also known as entity reference 1)
- ddd is the unicode code point in decimal form,
- hhhh is the uncideo code point in hexadecimal form
All entities may not be supported by old browsers but support in recent browsers is good.
List
This list is non-exhaustive, see the named character reference for all name
Character Description | Entity Name | Decimal | Hex | Rendering in Your Browser | ||
---|---|---|---|---|---|---|
Entity (Name) | Unicode Decimal | Unicode Hex | ||||
quotation mark = APL quote | " | " | " | “ | ” | “ |
ampersand | & | & | & | & | & | & |
less-than sign | < | < | < | < | < | < |
greater-than sign | > | > | > | > | > | > |
Latin capital ligature OE | Œ | Œ | Œ | Œ | Œ | Œ |
Latin small ligature oe | œ | œ | œ | œ | œ | œ |
Latin capital letter S with caron | Š | Š | Š | Š | Š | Š |
Latin small letter s with caron | š | š | š | š | š | š |
Latin capital letter Y with diaeresis | Ÿ | Ÿ | Ÿ | Ÿ | Ÿ | Ÿ |
modifier letter circumflex accent | ˆ | ˆ | ˆ | ˆ | ˆ | ˆ |
small tilde | ˜ | ˜ | ˜ | ˜ | ˜ | ˜ |
en space |   |   |   | |||
em space |   |   |   | |||
thin space |   |   |   | |||
zero width non-joiner | ‌ | ‌ | ‌ | | | |
zero width joiner | ‍ | ‍ | ‍ | | | |
left-to-right mark | ‎ | ‎ | ‎ | | | |
right-to-left mark | ‏ | ‏ | ‏ | | | |
en dash | – | – | – | – | – | – |
em dash | — | — | — | — | — | — |
left single quotation mark | ‘ | ‘ | ‘ | ‘ | ‘ | ‘ |
right single quotation mark | ’ | ’ | ’ | ’ | ’ | ’ |
single low-9 quotation mark | ‚ | ‚ | ‚ | ‚ | ‚ | ‚ |
left double quotation mark | “ | “ | “ | “ | “ | “ |
right double quotation mark | ” | ” | ” | ” | ” | ” |
double low-9 quotation mark | „ | „ | „ | „ | „ | „ |
dagger | † | † | † | † | † | † |
double dagger | ‡ | ‡ | ‡ | ‡ | ‡ | ‡ |
per mille sign | ‰ | ‰ | ‰ | ‰ | ‰ | ‰ |
single left-pointing angle quotation mark | ‹ | ‹ | ‹ | ‹ | ‹ | ‹ |
single right-pointing angle quotation mark | › | › | › | › | › | › |
euro sign | € | € | € | € | € | € |
Glyphs of the characters are available at the Unicode Consortium and should be already available in every browser.
Function / Library
Function
Encode
- A function in javascript to encode from text to entities
function toEntities(text) {
let entities = [];
for (let i=0;i<text.length;i++) {
let entity = `&#${text[i].charCodeAt()};`
entities.push(entity);
}
return entities.join('');
}
- Function Example
let reservedCharacters= `"><`;
let entities = toEntities(reservedCharacters);
console.log(`The reserved characters (${reservedCharacters}) in entities format is (${entities})`);
- You can then use them also in a HTML string attribute value. For instance in a title of an anchor
let anchorHTML = `<a href="#" title="${entities}">Anchor with entities</a> Keep your mouse on the link to see the title tooltip.`;
document.body.insertAdjacentHTML('afterbegin', anchorHTML);
- Output: See the entities and see the reserved characters in the title attribute of the anchor
Decode
When decoding your function should take into account the three format (name, decimal and hexadecimal)
The below javascript function shows an example for the decimal form that just uses a basic regular expression replace function
function decodeDecimalEntity(text) {
return text.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
}
console.log(decodeDecimalEntity('>'));
Pure Library (Encode/Decode)
Library have already the encode/decode functions and may add extra functionalities
- php:
- Javascript:
Library from Ascii to Entities
Library may also implement a mapping between a ascii sequence of characters to an entity.
This mapping in a font is called a ligature.
For instance:
- -- into en-dash entity –
- --- into em-dash entity —
List: