Table of Contents

About

The XML Infoset:

  • is a tree-based hierarchical representation of an XML document.
  • is the abstract data and metadata (ie abstract means independently of this representation, independent of the actual technical implementation)
  • represents the significant informations of an XML document

Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.

Types of information items

An XML document's information set consists of a number of information items.

The information set for any well-formed XML document will contain at least a document information item and several others.

An information set can contain up to eleven different types of information items:

  • The Document Information Item (always present)
  • Element Information Items
  • Attribute Information Items
  • Processing Instruction Information Items
  • Unexpanded Entity Reference Information Items
  • Character Information Items
  • Comment Information Items
  • The Document Type Declaration Information Item
  • Unparsed Entity Information Items
  • Notation Information Items
  • Namespace Information Items

There are information items representing:

Representation

XML is just one way of representing that data.

The infoset may exist:

  • in memory as a DOM tree
  • in a XML encoded in UTF-8 or in UTF-16.

For example, the infoset does not distinguish between the two forms of empty element.

The following are considered equivalent according to the XML Infoset.

<test></test>
<test/>

Augmentation

Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes.

Documentation / Reference