About
The XML Infoset:
- is a tree-based hierarchical representation of an XML document.
- is the abstract data and metadata (ie abstract means independently of this representation, independent of the actual technical implementation)
- represents the significant informations of an XML document
Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.
Articles Related
Types of information items
An XML document's information set consists of a number of information items.
The information set for any well-formed XML document will contain at least a document information item and several others.
An information set can contain up to eleven different types of information items:
- The Document Information Item (always present)
- Element Information Items
- Attribute Information Items
- Processing Instruction Information Items
- Unexpanded Entity Reference Information Items
- Character Information Items
- Comment Information Items
- The Document Type Declaration Information Item
- Unparsed Entity Information Items
- Notation Information Items
- Namespace Information Items
There are information items representing:
- the document,
- its elements,
- processing instructions,
- unexpanded entity references,
- and the document type declaration
Representation
XML is just one way of representing that data.
The infoset may exist:
- in memory as a DOM tree
- in a XML encoded in UTF-8 or in UTF-16.
- …
For example, the infoset does not distinguish between the two forms of empty element.
The following are considered equivalent according to the XML Infoset.
<test></test>
<test/>
Augmentation
Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes.