The XML Infoset:

  • is a tree-based hierarchical representation of an XML document.
  • is the abstract data and metadata (ie abstract means independently of this representation, independent of the actual technical implementation)
  • represents the significant informations of an XML document

Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.

Types of information items

An XML document's information set consists of a number of information items.

The information set for any well-formed XML document will contain at least a document information item and several others.

An information set can contain up to eleven different types of information items:

  • The Document Information Item (always present)
  • Element Information Items
  • Attribute Information Items
  • Processing Instruction Information Items
  • Unexpanded Entity Reference Information Items
  • Character Information Items
  • Comment Information Items
  • The Document Type Declaration Information Item
  • Unparsed Entity Information Items
  • Notation Information Items
  • Namespace Information Items

There are information items representing:


XML is just one way of representing that data.

The infoset may exist:

  • in memory as a DOM tree
  • in a XML encoded in UTF-8 or in UTF-16.

For example, the infoset does not distinguish between the two forms of empty element.

The following are considered equivalent according to the XML Infoset.



Infoset augmentation or infoset modification refers to the process of modifying the infoset during schema validation, for example by adding default attributes.

