Java XML - DOM Jaxp

Java Conceptuel Diagram

About

The DOM API of the JSE (ie Jaxp) in Java to process an XML file.

To see other DOM implementation, see Java XML - DOM

Package

  • org.w3c.dom: Defines the Document class (a DOM) as well as classes for all the components of a DOM.
  • javax.xml.transform.dom This package implements DOM-specific transformation APIs.

Entry point

Jaxpintro Domapi

See: https://docs.oracle.com/javase/tutorial/jaxp/intro/dom.html

How to

The process of navigating to a node involves processing sub-elements, ignoring the ones you are not interested in and inspecting the ones you are, until you find the node you are interested in.

Generally, the vast majority of nodes in a DOM tree will be Element and Text nodes.

Obtaining Node information

The DOM node element type information is obtained by calling the various methods of the The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Node class.

Node n;
String val;
val = n.getNodeName();
val = n.getNamespaceURI();
val = n.getPrefix();
val = n.getLocalName();
val = n.getNodeValue();
if (val != null) {
            out.print(" nodeValue=");
            if (val.trim().equals("")) {
                // Whitespace
                out.print("[WS]");
            } else {
                out.print("\"" + n.getNodeValue() + "\"");
            }
        }

Every DOM node has at least a type, a name, and a value, which might or might not be empty.

Lexical Informations control

Lexical information is the information you need to reconstruct the original syntax of an XML document. Preserving lexical information is important in editing applications, where you want to save a document that is an accurate reflection of the original-complete.

The following lexical markup may or not included in the outset:

The following The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.DocumentBuilderFactory methods give you control over this lexical nodes and over whitespace. The default behaviour is to preserve lexical information.

API Preserve
Lexical Info
Focus
on Content
Description
setCoalescing() False True To convert CDATA nodes to Text nodes and append to an adjacent Text node (if any).
setExpandEntityReferences() False True To expand entity reference nodes.
setIgnoringComments() False True To ignore comments.
setIgnoringElementContentWhitespace() False True To ignore whitespace that is not a significant part of element content.

Reading XML Data into a DOM

Node attributes are not included as children in the DOM hierarchy. They are instead obtained via the Node interface's getAttributes method.

The DocType interface is an extension of The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.w3c.org.dom.Node. It defines the getEntities method, which you use to obtain Entity nodes - the nodes that define entities. Like Attribute nodes, Entity nodes do not appear as children of DOM nodes.

Creating Nodes

You can create different types nodes using the methods of the Document interface.

For example:

  • createElement,
  • createComment,
  • createCDATAsection,
  • createTextNode, and so on.

The full list of methods for creating different nodes is provided in the API documentation for The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Document.

Traversing Nodes

The The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Node interface defines a number of methods you can use to traverse nodes, including:

  • getFirstChild,
  • getLastChild,
  • getNextSibling,
  • getPreviousSibling,
  • and getParentNode.

Those operations are sufficient to get from anywhere in the tree to any other location in the tree.

Searching for Nodes

Although it is tempting to get the first child and inspect it to see whether it is the right one, the search must account for the fact that the first child in the sub-list could be a comment or a processing instruction. If the XML data has not been validated, it could even be a text node containing ignorable whitespace.

In essence, you need to look through the list of child nodes, ignoring the ones that are of no concern and examining the ones you care about. Here is an example of the kind of routine you need to write when searching for nodes in a DOM hierarchy.

/**
 * Find the named subnode in a node's sublist.
 * <li>Ignores comments and processing instructions.
 * <li>Ignores TEXT nodes (likely to exist and contain
 *         ignorable whitespace, if not validating.
 * <li>Ignores CDATA nodes and EntityRef nodes.
 * <li>Examines element nodes to find one with
 *        the specified name.
 * </ul>
 * @param name  the tag name for the element to find
 * @param node  the element node to start searching from
 * @return the Node found
 */
public Node findSubNode(String name, Node node) {
    if (node.getNodeType() != Node.ELEMENT_NODE) {
        System.err.println(
                "Error: Search node not of element type");
        System.exit(22);
    }

    if (! node.hasChildNodes()) return null;

    NodeList list = node.getChildNodes();
    for (int i=0; i < list.getLength(); i++) {
        Node subnode = list.item(i);
        if (subnode.getNodeType() == Node.ELEMENT_NODE) {
            if (subnode.getNodeName().equals(name)) return subnode;
        }
    }
    return null;
}

Obtaining Node Content

When you want to get the text that a node contains, you again need to look through the list of child nodes, ignoring entries that are of no concern and accumulating the text you find in:

  • TEXT nodes,
  • CDATA nodes,
  • and EntityRef nodes.
/**
  * Return the text that a node contains. This routine:<ul>
  * <li>Ignores comments and processing instructions.
  * <li>Concatenates TEXT nodes, CDATA nodes, and the results of
  *     recursively processing EntityRef nodes.
  * <li>Ignores any element nodes in the sublist.
  *     (Other possible options are to recurse into element 
  *      sublists or throw an exception.)
  * </ul>
  * @param    node  a  DOM node
  * @return   a String representing its contents
  */
public String getText(Node node) {
    StringBuffer result = new StringBuffer();
    if (! node.hasChildNodes()) return "";

    NodeList list = node.getChildNodes();
    for (int i=0; i < list.getLength(); i++) {
        Node subnode = list.item(i);
        if (subnode.getNodeType() == Node.TEXT_NODE) {
            result.append(subnode.getNodeValue());
        }
        else if (subnode.getNodeType() ==
                Node.CDATA_SECTION_NODE) 
        {
            result.append(subnode.getNodeValue());
        }
        else if (subnode.getNodeType() ==
                Node.ENTITY_REFERENCE_NODE) 
        {
            // Recurse into the subtree for text
            // (and ignore comments)
            result.append(getText(subnode));
        }
    }
    return result.toString();
}

Creating Attributes

The The inter-wiki (j2se) does not exist and there is no default inter-wiki defined.org.w3c.dom.Element interface, which extends Node, defines a setAttribute operation, which adds an attribute to that node. (A better name from the Java platform standpoint would have been addAttribute. The attribute is not a property of the class, and a new object is created.) You can also use the Document's createAttribute operation to create an instance of Attribute and then use the setAttributeNode method to add it.

Removing and Changing Nodes

To remove a node, you use its parent Node's removeChild method. To change it, you can use either the parent node's replaceChild operation or the node's setNodeValue operation. Inserting Nodes

The important thing to remember when creating new nodes is that when you create an element node, the only data you specify is a name. In effect, that node gives you a hook to hang things on. You hang an item on the hook by adding to its list of child nodes. For example, you might add:

  • a text node,
  • a CDATA node,
  • or an attribute node.

Documentation / Reference





Discover More
Jaxpintro Saxapi
Java - Simple API for XML (SAX)

The Simple API for XML (SAX) is the event-driven, serial-access mechanism of Jaxp that does element-by-element processing. Setting up a program to use SAX requires a bit more work than setting up to...
Java Conceptuel Diagram
Java - Streaming API for XML (StAX)

The StAX APIs defined in javax.xml.stream provide a streaming Java technology-based, event-driven, pull-parsing API for reading and writing XML documents. StAX offers a simpler programming model than SAX...
Java Conceptuel Diagram
Java XML - DOM

in JAVA Type Best suited API XML Schema supported Document DOM JAXP Yes Data JDOM, dom4j, regular-expression No Standards such as JDOM and dom4j are targeted for applications where the XML...
Java Conceptuel Diagram
Java XML - JDOM

Jdom want to provide a robust, light-weight means of reading and writing XML data without complex and memory-consumptive options. JDOM is an in-memory representation of an XML document. JDOM interoperates...
Java Conceptuel Diagram
Java XML - Java API for XML Processing (JAXP)

API JAXP (Java API for XML Processing) is an umbrella term that bundle the various low-level XML APIs in JavaSE. ie the parser standards javax/xml/parsers/package-summaryParsers Package (javax.xml.parsers):...



Share this page:
Follow us:
Task Runner