XML documents must be _well-formed_. Well-formed XML is more restrictive than HTML: elements must be nested, start tags must match corresponding end tags, and attribute names within start tags must be unique, among other things. XML documents can also include a _document type definition_ (DTD). For example, the DTD statement "<!ELEMENT list (listitem*)>" specifies that "list" elements can contain sequences of "listitem" elements. _Validation_, or checking that an XML document follows the rules of a DTD, ensures that only meaningful data reaches an application.
The XML standard fixes many aspects of the behavior of compliant applications when processing XML, so there are many libraries that perform routine XML parsing tasks and report only important information to applications. Most XML parsing libraries use one of two interfaces, Simple API for XML (SAX) [17] and Document Object Model (DOM) [6]. SAX is an event-based API, suitable for one-pass algorithms such as search tools and filters. DOM provides an interface to XML data stored in memory as trees, and is better suited to multi-pass algorithms.
There are many additional tools, standards, and technologies in development by the W3C and Web community, but which are not relevant in this paper. For more information see the W3C web site's XML page, "http://www.w3c.org/XML".