Lex Diamonds Blog: Task 8: Evaluate SAX (the XML API, giving examples of usage.

Understanding Event-Driven XML Processing

SAX (Simple API for XML) is an event-driven model for processing XML. Most XML processing models (for example: DOM and XPath) build an internal, tree-shaped representation of the XML document. The developer then uses that model's API (getElementsByTagName in the case of the DOM or findnodes using XPath, for example) to access the contents of the document tree. The SAX model is quite different. Rather than building a complete representation of the document, a SAX parser fires off a series of events as it reads the document from beginning to end. Those events are passed to event handlers, which provide access to the contents of the document.

Event Handlers

There are three classes of event handlers: DTDHandlers, for accessing the contents of XML Document-Type Definitions; ErrorHandlers, for low-level access to parsing errors; and, by far the most often used, DocumentHandlers, for accessing the contents of the document. For clarity's sake, I'll only cover DocumentHandler events.

A SAX processor will pass the following events to a DocumentHandler:

The start of the document.
A processing instruction element.
A comment element.
The beginning of an element, including that element's attributes.
The text contained within an element.
The end of an element.
The end of the document.

WHAT IS SAX EXACTLY and Which Parser to Use (SAX or DOM)

SAX chooses to give you access to the information in your XML document, not as a tree of nodes, but as a sequence of events! You ask, how is this useful? The answer is that SAX chooses not to create a default Java object model on top of your XML document (like DOM does). This makes SAX faster, and also necessitates the following things:

creation of your own custom object model
creation of a class that listens to SAX events and properly creates your object model.

Note that these steps are not necessary with DOM, because DOM already creates an object model for you (which represents your information as a tree of nodes).

In the case of DOM, the parser does almost everything, read the XML document in, create a Java object model on top of it and then give you a reference to this object model (a Document object) so that you can manipulate it. SAX is not called the Simple API for XML for nothing, it is really simple.

SAX doesn’t expect the parser to do much, all SAX requires is that the parser should read in the XML document, and fire a bunch of events depending on what tags it encounters in the XML document. You are responsible for interpreting these events by writing an XML document handler class, which is responsible for making sense of all the tag events and creating objects in your own object model. So you have to write:

Your custom object model to “hold” all the information in your XML document into.

A document handler that listens to SAX events (which are generated by the SAX parser as its reading your XML document) and makes sense of these events to create objects in your custom object model.

SAX can be really fast at runtime if your object model is simple. In this case, it is faster than DOM, because it bypasses the creation of a tree based object model of your information. On the other hand, you do have to write a SAX document handler to interpret all the SAX events (which can be a lot of work).

What kinds of SAX events are fired by the SAX parser? These events are really very simple. SAX will fire an event for every open tag, and every close tag. It also fires events for #PCDATA and CDATA sections. You document handler (which is a listener for these events) has to interpret these events in some meaningful way and create your custom object model based on them. Your document handler will have to interpret these events and the sequence in which these events are fired is very important. SAX also fires events for processing instructions, DTDs, comments, etc. But the idea is still the same, your handler has to interpret these events (and the sequence of the events) and make sense out of them.

When to Use DOM

If your XML documents contain document data (e.g., Framemaker documents stored in XML format), then DOM is a completely natural fit for your solution. If you are creating some sort of document information management system, then you will probably have to deal with a lot of document data.

An example of this is the Datachannel RIO product, which can index and organize information that comes from all kinds of document sources (like Word and Excel files). In this case, DOM is well suited to allow programs access to information stored in these documents.

However, if you are dealing mostly with structured data (the equivalent of serialized Java objects in XML) DOM is not the best choice. That is when SAX might be a better fit.

In my Conclusion

The SAX document handler you write does element to object mapping. If your information is structured in a way that makes it easy to create this mapping you should use the SAX API. On the other hand, if your data is much better represented as a tree then you should use DOM.

Referece: Simple API for XML (SAX) (2010)[Online] Viewed At: http://www.ibm.com/developerworks/xml/standards/x-saxspec.html [Accessed 26/12/2010]

Lex Diamonds Blog

Pages

Thursday 30 December 2010

Task 8: Evaluate SAX (the XML API, giving examples of usage.

No comments:

Post a Comment