With the pre­val­ence of XML as the markup language for platform-in­de­pend­ent data exchanges, there is an in­creas­ing need for a standard that enables non-XML-based ap­plic­a­tions to submit complex queries to XML documents.

Note

The Ex­tens­ible Markup Language (short for XML) is a markup language used for dis­play­ing hier­arch­ic­ally-struc­tured data in text form. XML is equally easy to read for both humans and machines. One of its uses is the exchange of data between two computer systems on the World Wide Web.

The relevant standards for program-con­trolled access to XML documents was developed by the W3 Con­sor­ti­um along with XQuery and XSLT. These have program in­ter­faces available that can access ap­plic­a­tions on XML documents, query content or transform XML documents. They require a standard that enables elements in XML documents to be addressed: the XPath path de­scrip­tion language.

We’ll get you started with the XPath Data Model (XDM) and introduce to you to the syntax that un­der­lines the XPath ex­pres­sions used to localise XML elements.

Cheap domain names – buy yours now
  • Free website pro­tec­tion with SSL Wildcard included
  • Free private re­gis­tra­tion for greater privacy
  • Free Domain Connect for easy DNS setup

What is XPath?

XML Path Language (XPath) is a path de­scrip­tion language for XML documents developed by the W3 Con­sor­ti­um. XPath provides users with non-XML-based syntax that makes it possible to spe­cific­ally address the elements of an XML document.

XPath is normally used in an embedded host language that enables the addressed XML elements to be processed. XQuery, for example, is used to query the XML elements addressed by XPath. XSLT uses the query language when trans­form­ing XML documents.

  • XPath: Nav­ig­a­tion in XML documents
  • XQuery: Queries for XML documents
  • XSLT: Trans­form­a­tion of XML documents

3.1, the current XPath version, is specified in the W3C re­com­mend­a­tion from March 21, 2017.

Note

Despite ongoing de­vel­op­ment, numerous XSLT pro­cessors, web browsers and ap­plic­a­tions still only support the standard XPath 1.0 from the year 1999.

How Does XPath Work?

A data model underlies XPath and thisin­ter­prets XML documents as a sequence of elements that are arranged in a tree structure. The tree structure of the XPath data model is com­par­able to the Document Object Model (DOM). Thisalso acts as an interface between HTML and dynamic JavaS­cript in the web browser.

In the form of paths, the loc­al­isa­tion of XML elements occurs based on the unix directory system. The basic elements of this loc­al­isa­tion path are nodes, axes, node tests and pre­dic­ates.

Node Types

The in­di­vidu­al elements of an XPath tree structure are referred to as nodes. Ordering the nodes occurs both through the document sequence and through nesting the XML elements.

The XPath data model dis­tin­guishes seven node types with different functions:

  • Element node
  • Document node (from XPath 2.0 onwards—pre­vi­ously they were known as root nodes)
  • Attribute node
  • Text node
  • Namespace node
  • Pro­cessing in­struc­tion node
  • Comment node

The following example il­lus­trates the XPath data model node types. The XML document below, used to exchange data for a book order, contains all seven node types.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Order SYSTEM "order.dtd">
<?xml-stylesheet type="text/css" href="style.css"?>
<!--This is a comment!-->
<order date="2019-02-01">
    <address xmlns:shipping="http://localhost/XML/delivery" xmlns:billing="http://localhost/XML/billing">
        <shipping:name>Ellen Adams</shipping:name>
        <shipping:street>123 Maple Street</shipping:street>
        <shipping:city>Mill Valley</shipping:city>
        <shipping:state>CA</shipping:state>
        <shipping:zip>10999</shipping:zip>
        <shipping:country>USA</shipping:country>
        <billing:name>Mary Adams</billing:name>
        <billing:street>8 Oak Avenue</billing:street>
        <billing:city>Old Town</billing:city>
        <billing:state>PA</billing:state>
        <billing:zip>95819</billing:zip>
        <billing:country>USA</billing:country>
    </address>
    <comment>Please use gift wrapping!</comment>
    <items>
        <book isbn="9781408845660">
            <title>Harry Potter and the Prisoner of Azkaban</title>
            <quantity>1</quantity>
            <priceus>22.94</priceus>
            <comment>Please confirm delivery date until Christmas.</comment>
        </book>
        <book isbn="9780544003415">
            <title>The Lord of the Rings</title>
            <quantity>1</quantity>
            <priceus>17.74</priceus>
        </book>
    </items>
</order>

Element Node

In the XPath data model tree structure, each XML document element cor­res­ponds to an element node. Some ex­cep­tions are the XML de­clar­a­tion and the document defin­i­tion at the beginning of the document.

XML de­clar­a­tion:

<?xml version="1.0"? encoding="utf-8"?>

Document Type Defin­i­tion (DTD):

<!DOCTYPE Order SYSTEM "order.dtd">

Element nodes begin with a start tag, finish with an end tag and are usually nested into each other.

The first element nodes in the document sequence are referred to as root elements.

The XML document pictured above, for example, contains the element node order as a root element. This acts as a parent element for the sub­or­din­ated element nodes address, comment and items that again contain ad­di­tion­al element nodes as child elements.

Document Node

The roots of the tree structure are referred to as document nodes. In the XML document itself, this is neither demon­strated visually nor rep­res­en­ted by text. It is a con­cep­tu­al node that contains all the other elements of a node. Child elements of the document node are root elements as well as (where ap­plic­able) pro­cessing in­struc­tion nodes and comment nodes.

Attribute Node

The at­trib­utes of an XML element are rep­res­en­ted in the XPath data model as attribute nodes. Each attribute node consists of an iden­ti­fi­er and a value assigned to the attribute.

In the code example, the first element node contains book and the attribute node isbn with the value 9781408845660.

<book isbn="9781408845660">

Attribute nodes are con­sidered part of the element node, but not a child element of the element.

Text Node

Character data within the start and end tags of an element node are referred to as text nodes.

In the code example, the element node contains title and the text node contains Harry Potter and the Prisoner of Azkaban.

Harry Potter and the Prisoner of Azkaban

Namespace Node

In the case of well-formed XML documents, the element and attribute names being used are assigned a namespace. The as­sign­ment usually occurs through the Document Type Defin­i­tion right at the beginning of the document.

If different namespaces are used in an XML document element or attribute, the re­spect­ive namespaces will be ex­pli­citly defined with the xmlns attribute or xmlns prefix in the start tag of the element in question. The attribute xmlns presumes a Uniform Resource Iden­ti­fi­er (URI) as a value that specifies which namespace is to be assigned to the cor­res­pond­ing element. The option of assigning a namespace to an xmlns prefix is possible for the element or child element. Each namespace cor­res­ponds to a namespace node in the tree structure.

In the code example, two namespaces were defined for the XML element address: xmlns:shipping and xmlns:billing. The child elements of the address element bear the re­spect­ive as­sign­ment as a prefix.

<address xmlns:shipping="http://localhost/XML/delivery" xmlns:billing="http://localhost/XML/ billing">
        <shipping:name>Ellen Adams</shipping:name>
        <shipping:street>123 Maple Street</shipping:street>
        <shipping:city>Mill Valley</shipping:city>
        <shipping:state>CA</shipping:state>
        <shipping:zip>10999</shipping:zip>
        <shipping:country>USA</shipping:country>
        <billing:name>Mary Adams</billing:name>
        <billing:street>8 Oak Avenue</billing:street>
        <billing:city>Old Town</billing:city>
        <billing:state>PA</billing:state>
        <billing:zip>95819</billing:zip>
        <billing:country>USA</billing:country>
    </address>

The xmlns prefix makes it possible to clearly assign elements of the same name from different namespaces. The element street with the prefix shipping, for example, contains the street specified in the delivery address. The element street with the prefix billing, in contrast, contains the street specified in billing address.

Pro­cessing In­struc­tion Node

Pro­cessing in­struc­tions in XML documents are located outside the document tree structure and are referred to in XPath ter­min­o­logy as a pro­cessing in­struc­tion node. A process in­struc­tion node begins with "<?" and ends with "?>".

In the code example presented above, you’ll find the following pro­cessing in­struc­tion:

<?xml-stylesheet type="text/css" href="style.css"?>

The XML de­clar­a­tion at the beginning of the XML file is syn­tactic­ally con­struc­ted like a process in­struc­tion. However, it is not valid as a process in­struc­tion node as defined by the XPath data model.

Comment Node

XML document content marked as a comment will be processed by XPath as a comment node. In this situation, the node comprises only the marked character content, not the markup.

In the code example presented above, you find the following comment node:

This is a comment!

Loc­al­isa­tion Path

Ad­dress­ing nodes occurs with the help of a loc­al­isa­tion path. With loc­al­isa­tion paths, it is a matter of using an XPath ex­pres­sion to navigate through the tree structure and to choose a desired node set. The node set is the outcome of an XPath ex­pres­sion.

Loc­al­isa­tion paths are evaluated from left to right. One dis­tin­guishes between absolute and relative loc­al­isa­tion paths. An absolute loc­al­isa­tion path begins at the document node. In this case, you prefix the XPath ex­pres­sion with a slash (/). Relative loc­al­isa­tion paths begin at an arbitrary node within the tree structure. This starting point is called the context node.

A loc­al­isa­tion path consists of in­di­vidu­al loc­al­isa­tion steps that, as is the case when ad­dress­ing files in the directory system, are separated by a slash (/).

Each loc­al­isa­tion step consists of up to three parts: the axis, the node test and an arbitrary number of pre­dic­ates.

  • Axis: When choosing the axis, you determine the nav­ig­a­tion direction in the tree structure starting from the context or document node.
  • Node test: The node test cor­res­ponds to a filter with which you limit the notes lying on the axis to the desired node set.
  • Pre­dic­ates: Pre­dic­ates enable you to again filter the nodes selected through the axis and node test.

The loc­al­isa­tion path for an XPath ex­pres­sion is notated in ac­cord­ance with the following syntax:

axis::nodetest[predicate1][ predicate 2]…
Notation Function
/ Functions as path separator between two loc­al­isa­tion steps
:: Functions as path separator between axis and node test

Axes

The XPath syntax enables a nav­ig­a­tion by means of the following axes.

Axis Selected Nodes
child All directly sub­or­din­ated child nodes
parent The directly su­per­or­din­ate parent node
des­cend­ant All sub­or­din­ated nodes
ancestor* All su­per­or­din­ated nodes
following All the sub­sequent nodes in the document sequence with the exception of des­cend­ants
preceding* All preceding nodes in the document series with the exception of ancestors
following-sibling All the sub­sequent nodes in the XML document that descend from the same parent node
preceding-sibling* All the preceding nodes in the XML document that descend form the same parent node
attribute All attribute nodes for an element node
namespace All namespace nodes for an element node. As of version 2.0, this axis is no longer contained in the spe­cific­a­tion
self The context node itself
des­cend­ant-or-self All sub­or­din­ated nodes including the context node
ancestor-or-self* All su­per­or­din­ated nodes including the context node
Note

In the case of the axes denoted with an asterisk (*), there are backward ap­plic­a­tions that are an optional component according to the XPath spe­cific­a­tion version 1.0 and do not have to be supported by standard-compliant ap­plic­a­tions.

The following graph shows a schematic rep­res­ent­a­tion of the most important axes in the XPath data model starting from the context node (red).

For example, all child:: elements choose D from the context node. The node set comprises the nodes E, H and I.

Node Test

With the node test you define a filter for the node set selected via the axis. According to the XPath spe­cific­a­tion there are two possible filter criteria.

  • Node name: Specify a node name as a node test in order to choose all nodes with the cor­res­pond­ing name on the chosen axis.
  • Node type: Specify a node type as a node test in order to choose all nodes on the chosen axis with the cor­res­pond­ing type.

Node Names as a Filter Criterion

With the following loc­al­isa­tion path, for example, you could choose—based on the code example presented above—all des­cend­ants with the name book starting from the document node.

/descendant::book

If, however, you would like to filter out the attribute isbn for all element nodes with the name book, you’ll need a loc­al­isa­tion path with two loc­al­isa­tion steps.

/descendant::book/attribute::isbn

Node Type as Filter Criterion

If you’d like to define a node type as a filter criterion for selecting the node set, use one of the following functions as a node test:

Function Selected Nodes
node() The node() function selects all nodes on the chosen axis.
text() The text() function selects all text nodes on the chosen axis.
comment() The comment() function selects all comment nodes on the chosen axis.
pro­cessing-in­struc­tion() The pro­cessing in­struc­tion() function selects all process in­struc­tion nodes on the chose axis.
Note

XPath 1.0 already defines 25 functions. Beginning with XPath 2.0 there are 111 functions available for spe­cify­ing loc­al­isa­tion paths. You’ll find an overview in the W3C re­com­mend­a­tion XPath and XQuery functions and operators 3.1 from March 21, 2017.

Node Test with Wild Card

If you use the place holder * (asterisk) instead of the node test, all nodes will be chosen on the selected axis that cor­res­pond to the axis’ main node type. So, if an axis contains element nodes, then this node type is the axis’ main node type. This applies to all axes with the exception of attribute and namespace. In this case, attribute nodes or namespace nodes qualify as main node types.

The following loc­al­isa­tion path, for example, displays all the at­trib­utes of the current context node:

attribute::*

Shortened Notation

For the fre­quently-used axes and loc­al­isa­tion steps, short cuts were defined that can be used in the XPath ex­pres­sion as an al­tern­at­ive to the English des­ig­na­tions.

Standard Notation Short Cut Example
child:: blank In the case of child, it concerns the standard axis. The axis des­ig­na­tion can be omitted when necessary. The child::book/child::title loc­al­isa­tion path thus cor­res­ponds to the book/title short ab­bre­vi­ation.
attribute:: @ The axis attribute, including the separator, can be shorted with the @ symbol.The loc­al­isa­tion path book/attribute::isbn selects the isbn attribute node of the book element and states book/@isbn in the shortened notation.
/des­cend­ant-or-self::node()/ // The loc­al­isa­tion step /des­cend­ant-or-self::node()/ selects the document node and all des­cend­ants and is ab­bre­vi­ated with //. Instead of /des­cend­ant-or-self::node()/child::item write //item in shortened form. The loc­al­isa­tion path selects all item nodes in the document.
parent::node() .. The loc­al­isa­tion step parent::node() selects the parent node of the context node and is shortened with ..
self::node() . The loc­al­isa­tion step self::node() selects the current context node and is shortened with .

Pre­dic­ates

With pre­dic­ates you define further filter criteria for the node sets selected through the axis and node test.

Pre­dic­ates form the optional third part of a loc­al­isa­tion step and are notated in brackets. The filter criteria within the brackets is for­mu­lated as ex­pres­sions, that, among other things, can contain path ex­pres­sions, functions, operators and strings.

The XPath syntax supports universal pre­dic­ates and numerical pre­dic­ates.

Universal Pre­dic­ates

Ex­pres­sions in universal pre­dic­ates filter the node set that has been selected through the axis and node test by issuing a Boolean value (true or false) for each node in the selection. All nodes with the value true are part of the result set.

The for­mu­la­tion of ex­pres­sions for universal pre­dic­ates occurs with the help of operators. These are used in order to spe­cific­ally select specific nodes with specific content or prop­er­ties—for example, all nodes that include a character string, an attribute value or a specific child element (perhaps at a specific position).

The following tables give you an overview of the operators that are available. There is a dis­tinc­tion between arith­met­ic operators, logical operators and re­la­tion­al operators.

Arith­met­ic Operators Function
+ Addition
- Sub­trac­tion
* Mul­ti­plic­a­tion
div Floating point separator
mod Modulo
Re­la­tion­al Operators Function
= Equal
!= Unequal
< Less than; masking required within XSLT (&lt;)
> Greater than; masking within XSLT (&gt;) is recommend
<= Less than or equal; masking required within XSLT (&lt;)
>= Greater than or equal; Masking within XSLT (&gt;) re­com­men­ded
Logical Operators Function
and Logical And Con­nect­ive
or Logical Or Con­nect­ive

In the following example the predicate isolates [title="Harry Potter and the Prisoner of Azkaban"] the result set on an element node called book, which contains the child element title and the string Harry Potter and the Prisoner of Azkaban.

Note

The example cor­res­ponds to the XPath 3 syntax, which may not be supported by online tools. Have the presented query re­pro­duced here, for example, with the following online tester: http://videlibri.source­forge.net/cgi-bin/xidelcgi.

/order/items/book[title="Harry Potter and the Prisoner of Azkaban"]

We have now chosen the element node book, which contains the data for the Harry Potter book.

<book isbn="9781408845660">
        <title>Harry Potter and the Prisoner of Azkaban</title>
        <quantity>1</quantity>
        <priceus>22.94</priceus>
        <comment>Please confirm delivery date before Christmas.</comment>
    </book>

Another child element of this element node is the comment element. If we would like to select its content, the loc­al­isa­tion path must only be expanded by two loc­al­isa­tion steps.

/order/items/book[title="Harry Potter and the Prisoner of Azkaban"]/comment/text()

We navigate with the comment loc­al­isa­tion step (ab­bre­vi­ate form of child::comment) to the book element’s child element of the same name and select its text node with the text() function. This cor­res­ponds to the following string:

Please confirm delivery date before Christmas.

Should only a path ex­pres­sion be used in a predicate, then it’s called an existence test. With the following loc­al­isa­tion path, for example, it can be tested if the XML document presented above contains one or several nodes with the name comment.

Shortened notation:

//book[comment]

Standard notation:

/descendant-or-self::node()/child::book[child::comment]

The loc­al­isa­tion path //book[comment] selects all nodes with the name book that have a child element with the name comment.

Numerical Pre­dic­ates

Numerical pre­dic­ates enable you to address nodes using your position. The following loc­al­isa­tion path, for example, selects the second node in ac­cord­ance with the document sequence with the name book:

//book[2]

Strictly speaking, predicate [2] is the ab­bre­vi­ated form of [position()=2]. XPath thus initially selects all nodes with the name “book” and then filters out the node for which the position()=2 function yields the true Boolean value.

Note

Unlike with pro­gram­ming languages, XPath numbering begins with 1.

Ad­di­tion­al In­form­a­tion on XML Path Language

On the W3C website you will find an overview of the current de­vel­op­ment status of XML Path language as well as all released standards and designs.

Free in­form­a­tion and tools for using XPath for web ap­plic­a­tions are available to you at MDN Web Docs as well as in the Microsoft Developer Network.

Go to Main Menu