XML - (Structured) Document


Documents are built from node (elements and text node between node element). These elements form a tree using the DOM.

Each XML documents begin with an XML declaration which specifies the version of XML being used.

A document begins in a “root” or document entity.

Each XML document contains one or more elements.

A data object is an XML document if it is well-formed, as defined in this specification.

An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.

An XML document may consist of one or many storage units, called entities.


Even though the text in an address book may not permit bold, italics, colors, and font sizes today, one day you may want to handle these things. Because DOM will handle virtually anything you throw at it, choosing DOM makes it easier to future-proof your application.


Document oriented

<memo importance='high'
  <from>Paul V. Biron</from>
  <to>Ashok Malhotra</to>
  <subject>Latest draft</subject>
    We need to discuss the latest
    draft <emph>immediately</emph>.
    Either email me at <email>
    mailto:[email protected]</email>
    or call <phone>555-9876</phone>

Text and elements can be freely intermixed in a DOM hierarchy. That kind of structure is called mixed content in the DOM model and occurs frequently in documents.

For example, suppose you wanted to represent this structure:

<sentence>This is an <bold>important</bold> idea.</sentence>

The hierarchy of DOM nodes would look something like this, where each line represents one node:

ELEMENT: sentence
   + TEXT: This is an
   + ELEMENT: bold
       + TEXT: important
   + TEXT: idea.

The sentence element contains text, followed by a sub-element, followed by additional text. It is the intermixing of text and elements that defines the mixed-content model.

In this example, the “content” of the first element (its value) simply identifies the kind of node it is. First-time users of a DOM are usually thrown by this fact. After navigating to the <sentence> node, they ask for the node's “content”, and expect to get something useful. Instead, all they can find is the name of the element, sentence.

The value of an element is not the same as its content.


Data oriented. Standards such as JDOM and dom4j, on the other hand, make it easier to do simple things, because each node in the hierarchy is an object.

Although JDOM and dom4j make allowances for elements having mixed content, they are not primarily designed for such situations. Instead, they are targeted for applications where the XML structure contains data.

The elements in a data structure typically contain either text or other elements, but not both. For example, here is some XML that represents an invoice:

   <name>Ashok Malhotra</name>
   <street>123 Microsoft Ave.</street>


Powered by ComboStrap