web:html:entity

About

An Entity in html is a string that represents a unicode character.

In other words, an entity is a fully qualified notation that represents any unicode character.

Encoding text in HTML means to transform:

the text characters
into HTML entities.

Example

Complex Character: Phone

Example with the phone. This character has the unicode value:

0260E in hexadecimal
ie/or 9742 in decimal
phone as entity name.

Example:

the following HTML

To show a phone in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x0260E; (hexadecimal)</li>
  <li>&#9742; (decimal)</li>
  <li>&phone; (name)</li>
</ul>

will output:

Rendered by WebCode

Simple Character: letter A

This example shows you that you can also write any simple character (ie from the alphabet) also in entity.

Example with the letter A. This character has the unicode value:

41 in hexadecimal
ie/or 65 in decimal
no name

Example:

the following HTML

Therefore, to show the letter A in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x41; (hexadecimal)</li>
  <li>&#65; (decimal)</li>
  <li>A (the letter A)</li>
</ul>

will output:

Rendered by WebCode

Usage

Reserved Word Encoding

They are used to encode reserved XML/HTML character that are used in the value of an attribute.

For instance, the start < and end character > of an element tag cannot be used directly. They need to be replaced (ie encoded) in entity notation.

For instance, the character > would be replaced by the following entity >

Complex Characters

They are also used to show complex / special characters that are not easily accessible from the keyboard.

Format

The entity notation supports three definitions for a character:

&name; <!-- name notation -->
&#dddd;  <!-- decimal notation -->
&#xhhhh; <!-- hexadecimal notation -->

where:

name is a character name also known as entity reference ¹⁾
ddd is the unicode code point in decimal form,
hhhh is the uncideo code point in hexadecimal form

All entities may not be supported by old browsers but support in recent browsers is good.

List

This list is non-exhaustive, see the named character reference for all name

Character Description	Entity Name	Decimal	Hex	Rendering in Your Browser
				Entity (Name)	Unicode Decimal	Unicode Hex
quotation mark = APL quote	"	"	"	“	”	“
ampersand	&	&	&	&	&	&
less-than sign	<	<	<	<	<	<
greater-than sign	>	>	>	>	>	>
Latin capital ligature OE	&OElig;	Œ	Œ	Œ	Œ	Œ
Latin small ligature oe	&oelig;	œ	œ	œ	œ	œ
Latin capital letter S with caron	&Scaron;	Š	Š	Š	Š	Š
Latin small letter s with caron	&scaron;	š	š	š	š	š
Latin capital letter Y with diaeresis	&Yuml;	Ÿ	Ÿ	Ÿ	Ÿ	Ÿ
modifier letter circumflex accent	&circ;	ˆ	ˆ	ˆ	ˆ	ˆ
small tilde	&tilde;	˜	˜	˜	˜	˜
en space	&ensp;
em space	&emsp;
thin space
zero width non-joiner	&zwnj;	‌	‌	‌	‌	‌
zero width joiner	&zwj;	‍	‍	‍	‍	‍
left-to-right mark	&lrm;	‎	‎	‎	‎	‎
right-to-left mark	&rlm;	‏	‏	‏	‏	‏
en dash	–	–	–	–	–	–
em dash	—	—	—	—	—	—
left single quotation mark	‘	‘	‘	‘	‘	‘
right single quotation mark	’	’	’	’	’	’
single low-9 quotation mark	&sbquo;	‚	‚	‚	‚	‚
left double quotation mark	“	“	“	“	“	“
right double quotation mark	”	”	”	”	”	”
double low-9 quotation mark	&bdquo;	„	„	„	„	„
dagger	&dagger;	†	†	†	†	†
double dagger	&Dagger;	‡	‡	‡	‡	‡
per mille sign	&permil;	‰	‰	‰	‰	‰
single left-pointing angle quotation mark	&lsaquo;	‹	‹	‹	‹	‹
single right-pointing angle quotation mark	&rsaquo;	›	›	›	›	›
euro sign	€	€	€	€	€	€

Glyphs of the characters are available at the Unicode Consortium and should be already available in every browser.

Function / Library

Function

Encode

A function in javascript to encode from text to entities

function toEntities(text) {
    let entities = [];
	for (let i=0;i<text.length;i++) {
	    let entity = `&#${text[i].charCodeAt()};`
	    entities.push(entity);
	}
    return entities.join('');
}

Function Example

let reservedCharacters= `"><`;
let entities = toEntities(reservedCharacters);
console.log(`The reserved characters (${reservedCharacters}) in entities format is (${entities})`);

You can then use them also in a HTML string attribute value. For instance in a title of an anchor

let anchorHTML = `<a href="#" title="${entities}">Anchor with entities</a> Keep your mouse on the link to see the title tooltip.`;
document.body.insertAdjacentHTML('afterbegin', anchorHTML);

Output: See the entities and see the reserved characters in the title attribute of the anchor

Rendered by WebCode

Decode

When decoding your function should take into account the three format (name, decimal and hexadecimal)

The below javascript function shows an example for the decimal form that just uses a basic regular expression replace function

function decodeDecimalEntity(text) {
  return text.replace(/&#(\d+);/g, function(match, dec) {
     return String.fromCharCode(dec);
  });
}
console.log(decodeDecimalEntity('&#62;'));

Rendered by WebCode

Pure Library (Encode/Decode)

Library have already the encode/decode functions and may add extra functionalities

php:
- htmlentities
- html_entity_decode / html_entity_encode
Javascript:
- https://github.com/mathiasbynens/he

Library from Ascii to Entities

Library may also implement a mapping between a ascii sequence of characters to an entity.

This mapping in a font is called a ligature.

For instance:

-- into en-dash entity –
--- into em-dash entity —

List:

¹⁾

named character reference

Table of Contents

What is XML / HTML Character Entity encoding ?

About

Example

Complex Character: Phone

Simple Character: letter A

Usage

Reserved Word Encoding

Complex Characters

Format

List

Function / Library

Function

Encode

Decode

Pure Library (Encode/Decode)

Library from Ascii to Entities