Table of Contents

About

character in XML.

Characters are denoted using the notation used in the Unicode Standard, that is, an optional U+ followed by their hexadecimal number, using at least 4 digits, such as “U+1234” or “U+10FFFD”. In XML or HTML this could be expressed as “ሴ” or “􏿽”.

Legal characters are:

XML processors accept any character in the range specified for Char. All XML processors accept the UTF-8 and UTF-16 encodings of Unicode

Type of character

Reference

A character reference refers to a specific character in the unicode character set, for example one not directly accessible from available input devices.

Syntax

'&#' [0-9]+ ';'

or

'&#x' [0-9a-fA-F]+ ';'

Example

of character reference:

Type <key>less-than</key> (&#x3C;) to save options.

where 3C is the LESS-THAN SIGN Math Symbol

Data

Character data are all text that is not XML markup or comment.

You can define explicitly markup or comment as being character data with a character data section.

Special

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as:

If they are needed elsewhere, they MUST be escaped using either:

The right angle bracket (>) may be represented using the string “ &gt; ”, and MUST, for compatibility, be escaped using either “ &gt; ” or a character reference when it appears in the string “ ]]> ” in content, when that string is not marking the end of a CDATA section.

Documentation / Reference