next up previous
Next: Compressing XML as text Up: Compressing XML with Multiplexed Previous: Introduction

  
XML background

Superficially, XML documents looks a lot like HTML documents (see Figure 1). XML documents contain _element tags_, including start tags like "<title>" and end tags like "</title>". Elements can contain other elements nested inside them, forming a tree structure. Elements can also contain plain text, comments, and special instructions for XML processors (``processing instructions''). Opening element tags can have associated _attributes_, such as "type" in "<list type="ordered">". These are the most common constructs in XML; for more detail see the specification [7].

XML documents must be _well-formed_. Well-formed XML is more restrictive than HTML: elements must be nested, start tags must match corresponding end tags, and attribute names within start tags must be unique, among other things. XML documents can also include a _document type definition_ (DTD). For example, the DTD statement "<!ELEMENT list (listitem*)>" specifies that "list" elements can contain sequences of "listitem" elements. _Validation_, or checking that an XML document follows the rules of a DTD, ensures that only meaningful data reaches an application.

The XML standard fixes many aspects of the behavior of compliant applications when processing XML, so there are many libraries that perform routine XML parsing tasks and report only important information to applications. Most XML parsing libraries use one of two interfaces, Simple API for XML (SAX) [17] and Document Object Model (DOM) [6]. SAX is an event-based API, suitable for one-pass algorithms such as search tools and filters. DOM provides an interface to XML data stored in memory as trees, and is better suited to multi-pass algorithms.

There are many additional tools, standards, and technologies in development by the W3C and Web community, but which are not relevant in this paper. For more information see the W3C web site's XML page, "http://www.w3c.org/XML".


next up previous
Next: Compressing XML as text Up: Compressing XML with Multiplexed Previous: Introduction
James Cheney
2000-11-24