Home Page

[ SF Project ] [ SF Download ] [ Javadoc ] [ Example ] [ Porting ] [ Thermopylae ] [ XPath ] [ SF Feedback ]

Home
SF Project
SF Download
Javadoc
Example
Porting
Thermopylae
XPath
SF Feedback

Sparta is a lightweight Java XML package that includes an XML parser, a DOM, and an XPath interpreter. The code-size is small, the parser is fast, the object memory size is small, and the DOM API is clean and simple.

Thermopylae is a wrapper around Sparta that allows it to be used as a drop-in replacement for Xerces. It presents a W3C-standard parser and DOM.

Sparta was originally written in Hewlett-Packard Labs to support a project creating an infrastructure for building agile applications that distribute themselves throughout the global Internet and aggregate ensembles of local appliances.

Current Status

We have transferring the source code from our internal HP Labs CVS to sourceforge and the documentation from our internal wiki. The next step is to transfer our JUnit testing environment too.

Quick Start

Download Sparta.

See Sparta Example for a some Java code that uses Sparta and Porting to Sparta to see how to port from standard DOM code.

: If you do download Sparta please add a note to the discussion forum. Thanks.

General Features

Sparta has a simple fast parser.
The parser does not validate DTDs.
The parser does not handle external entities.
Sparta ignores namespaces. <a:Foo> and <b:Foo> are treated as if ":" was a normal tag character. These will be considered different tags even though strictly speaking there is a possibility that the a: and b: namespace have the same URI. (The Thermopylae wrapper does however support namespaces.)
The DOM is a simplification of the w3c DOM API[1]. The DOM uses the same class names and method names as the w3c DOM where possible, but it is independent of the w3c DOM and does not implement its interfaces. The DOM does not use interfaces and factories, instead it uses a few simple concrete classes. (The Thermopylae wrapper does provide a w3c interface.)
Sparta is implemented only using JDK 1.x so that it can run on J2ME devices. This means for example that it uses the old java.util classes (Vector, Hashtable. Enumeration) instead of the new ones (List, Map, Iterator).

To allow the Sparta code to work with other XML code there is also ThermopylaeXml which is a standard DOM wrapper around Sparta.

The SpartaXPath expression languages is a subset of the full Xpath grammar.[2]. It supports only the abbreviated syntax[3], which in practice is what most people use. The XPath handling is integrated into the DOM via xpathSelect* methods of the com.hp.hpl.sparta.Document and com.hp.hpl.sparta.Element classes.

The toString method of the DOM nodes returns the concatenation of all text nodes hierarchically under the given node. This gives convenient functionality similar to one provided in XSLT. Thus if a document contains the XML "<A>1<BB>2<ccc>3</ccc>4<ccc/>5</BB>6<BB>7</BB>8</A>" then "doc.xpathSelectElement("/A/BB").toString()" will return "2345".

Design and Implementation

XML Parser and DOM

com.hp.hpl.sparta

^Cvs:sparta/java/com/hp/hpl/agile/sparta

XPath

The XPath code is split up into two parts:

A parser that creates a parse tree: com.hp.hpl.sparta.xpath
A visitor that crawls over the parse tree, evaluating it against DOM nodes. There are two such implementations of such a visitor:
1. A package-private visitor class which evaluates with respect to Sparta DOM nodes.
2. A package-private visitor which evaluates with respect to standard W3C DOM nodes

Performance

Below are some performance measures comparing Sparta and Thermopylae to two other popular Java XML parsers: Xerces and Crimson.

Conclusion: Sparta or Thermopylae can parse XML more than twice as fast as the Crimson parser and more than five times faster than the popular Xerces parser. Based on the serialization experiments access to the Sparta DOM is 44% faster than access to the Crimson DOM and more than three times faster than access to the Xerces DOM. Access to the Thermopylae DOM is slightly slower than access to the Crimson DOM. Additionally when we increase the size of the XML to be parsed the Xerces parser runs out of memory.

Parsing from bytes to DOM:

Sparta:: 2.2 seconds ⇒ 20,000 lines/second
Thermopylae:: 2.4 seconds ⇒ 18,000 lines/second
Crimson:: 5.2 seconds ⇒ 8,200 lines/second
Xerces:: 12 seconds ⇒ 3,500 lines/second

Serialization from DOM to String:

Sparta:: 0.28 seconds ⇒ 160,000 lines/second
Thermopylae:: 0.49 seconds ⇒ 88,000 lines/second
Crimson:: 0.40 seconds ⇒ 110,000 lines/second
Xerces:: 0.98 seconds ⇒ 44,000 lines/second

Details: The times measured are wall-clock time on a Pentium-III HP Omnibook 6000 laptop. The time does not include reading from file. The XML parsed was 43,000 lines XML from the XMLConf test suite[4] that did not have deliberate syntax and well-formedness errors. Each result above is the mean of at least five runs.

(Naming note: The Greek city-state of Sparta was a lean-and-mean rival to the King Xerces of the Persian Empire. Xerces is also the name of the most common full XML parser and DOM.[5] Thermopylae is the battle in which Xerces defeated the Spartans.)