(XML|HTML) - Character Entity

About

An Entity in html is a string that represents a unicode character.

In other word, this is a fully qualified notation that represents any unicode character.

Example

Complex Character: Phone

Example with the phone. This character has the unicode value:

Example:

  • the following HTML
To show a phone in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x0260E; (hexadecimal)</li>
  <li>&#9742; (decimal)</li>
  <li>&phone; (name)</li>
</ul>  
  • will output:

Simple Character: letter A

This example shows you that you can also write any simple character (ie from the alphabet) also in entity.

Example with the letter A. This character has the unicode value:

Example:

  • the following HTML
Therefore, to show the letter A in a HTML document, you can write the following entities notation:
<ul>
  <li>&#x41; (hexadecimal)</li>
  <li>&#65; (decimal)</li>
  <li>A (the letter A)</li>
</ul>  
  • will output:

Usage

Reserved Word Encoding

They are used to encode reserved XML/HTML character that are used in the value of an attribute.

For instance, the start < and end character > of an element tag cannot be used directly. They need to be replaced (ie encoded) in entity notation.

For instance, the character > would be replaced by the following entity &gt;

Complex Characters

They are also used to show complex / special characters that are not easily accessible from the keyboard.

Format

The entity notation supports three definitions for a character:

&name; <!-- name notation -->
&#dddd;  <!-- decimal notation -->
&#xhhhh; <!-- hexadecimal notation -->

where:

All entities may not be supported by old browsers but support in recent browsers is good.

List

This list is non-exhaustive, see the named character reference for all name

Character Description Entity Name Decimal Hex Rendering in Your Browser
Entity (Name) Unicode Decimal Unicode Hex
quotation mark = APL quote &quot; &#34; &#x22;
ampersand &amp; &#38; &#x26; & & &
less-than sign &lt; &#60; &#x3C; < < <
greater-than sign &gt; &#62; &#x3E; > > >
Latin capital ligature OE &OElig; &#338; &#x152; Œ Œ Œ
Latin small ligature oe &oelig; &#339; &#x153; œ œ œ
Latin capital letter S with caron &Scaron; &#352; &#x160; Š Š Š
Latin small letter s with caron &scaron; &#353; &#x161; š š š
Latin capital letter Y with diaeresis &Yuml; &#376; &#x178; Ÿ Ÿ Ÿ
modifier letter circumflex accent &circ; &#710; &#x2C6; ˆ ˆ ˆ
small tilde &tilde; &#732; &#x2DC; ˜ ˜ ˜
en space &ensp; &#8194; &#x2002;
em space &emsp; &#8195; &#x2003;
thin space &thinsp; &#8201; &#x2009;
zero width non-joiner &zwnj; &#8204; &#x200C;
zero width joiner &zwj; &#8205; &#x200D;
left-to-right mark &lrm; &#8206; &#x200E;
right-to-left mark &rlm; &#8207; &#x200F;
en dash &ndash; &#8211; &#x2013;
em dash &mdash; &#8212; &#x2014;
left single quotation mark &lsquo; &#8216; &#x2018;
right single quotation mark &rsquo; &#8217; &#x2019;
single low-9 quotation mark &sbquo; &#8218; &#x201A;
left double quotation mark &ldquo; &#8220; &#x201C;
right double quotation mark &rdquo; &#8221; &#x201D;
double low-9 quotation mark &bdquo; &#8222; &#x201E;
dagger &dagger; &#8224; &#x2020;
double dagger &Dagger; &#8225; &#x2021;
per mille sign &permil; &#8240; &#x2030;
single left-pointing angle quotation mark &lsaquo; &#8249; &#x2039;
single right-pointing angle quotation mark &rsaquo; &#8250; &#x203A;
euro sign &euro; &#8364; &#x20AC;

Glyphs of the characters are available at the Unicode Consortium and should be already available in every browser.

Function / Library

Function

Encode

  • A function in javascript to encode from text to entities
function toEntities(text) {
    let entities = [];
	for (let i=0;i<text.length;i++) {
	    let entity = `&#${text[i].charCodeAt()};`
	    entities.push(entity);
	}
    return entities.join('');
}
  • Function Example
let reservedCharacters= `"><`;
let entities = toEntities(reservedCharacters);
console.log(`The reserved characters (${reservedCharacters}) in entities format is (${entities})`);
  • You can then use them also in a HTML string attribute value. For instance in a title of an anchor
let anchorHTML = `<a href="#" title="${entities}">Anchor with entities</a> Keep your mouse on the link to see the title tooltip.`;
document.body.insertAdjacentHTML('afterbegin', anchorHTML);
  • Output: See the entities and see the reserved characters in the title attribute of the anchor

Decode

When decoding your function should take into account the three format (name, decimal and hexadecimal)

The below javascript function shows an example for the decimal form that just uses a basic regular expression replace function

function decodeDecimalEntity(text) {
  return text.replace(/&#(\d+);/g, function(match, dec) {
     return String.fromCharCode(dec);
  });
}

Pure Library (Encode/Decode)

Library have already the encode/decode functions and may add extra functionalities

Library from Ascii to Entities

Library may also implements a mapping between a ascii sequence of characters to an entity.

For instance:

  • -- into en-dash entity
  • --- into em-dash entity

List:


Powered by ComboStrap