About
character in XML.
Characters are denoted using the notation used in the Unicode Standard, that is, an optional U+ followed by their hexadecimal number, using at least 4 digits, such as “U+1234” or “U+10FFFD”. In XML or HTML this could be expressed as “ሴ” or “􏿽”.
Legal characters are:
- tab,
- carriage return,
- line feed,
- and the legal characters of Unicode (ISO/IEC 10646)
XML processors accept any character in the range specified for Char. All XML processors accept the UTF-8 and UTF-16 encodings of Unicode
Articles Related
Type of character
Reference
A character reference refers to a specific character in the unicode character set, for example one not directly accessible from available input devices.
Syntax
'&#' [0-9]+ ';'
or
'&#x' [0-9a-fA-F]+ ';'
Example
of character reference:
Type <key>less-than</key> (<) to save options.
where 3C is the LESS-THAN SIGN Math Symbol
Data
Character data are all text that is not XML markup or comment.
You can define explicitly markup or comment as being character data with a character data section.
Special
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as:
- or within a comment,
- a processing instruction,
- or a CDATA section.
If they are needed elsewhere, they MUST be escaped using either:
- or the strings “ & ” and “ < ” respectively.
The right angle bracket (>) may be represented using the string “ > ”, and MUST, for compatibility, be escaped using either “ > ” or a character reference when it appears in the string “ ]]> ” in content, when that string is not marking the end of a CDATA section.