Tuesday, August 25, 2009

XPath for GWT

Node selection in XML on the client
As an AJAX toolkit Google Web Toolkit is supports retrieving XML documents from the server, and supplies an XML parser to obtain information from those documents. However, using the DOM to navigate the XML tree can be annoying. Some sort of selection functionality would be useful.

GWT provides an XML parser, but does not provide an XPath selection API. However, most browsers implement XPath selection. Searching for an XPath library for GWT turns up jaxen4gwt and XSLTForGWT. jaxen4gwt seems to be incomplete and is not a wrapper around the browser implementation. This means that it will have relatively poor performance ( see Eric Bessette's XSLTForGWT blog post ) compared to a browser native XPath implementation. XSLTForGWT instead wraps the Sarissa Javascript library in GWT. Sarissa in turn wraps the native XPath implementations ( with XSLT as a bonus ).

The Problem
Sarissa is successful at wrapping *most* oddities among the browsers, by requiring some extra configuration for IE (from the HOWTO):
Actually IE also needs the proprietary setProperty method for it's XPath implementation to work.
To demonstrate:
The namespaces used in the XPath query must be declared prior to executing the query. These functions are no-ops for other browsers, resulting in cross-browser support. The HOWTO continues:
Mozilla does not need any of the above. DOM L3 XPath is always available and namespaces are resolved err... automatically.
This is not quite true. If the namespaces are declared inside the XML in the root node Mozilla will resolve them automatically. However, if a namespace is declared in a lower scope and is used in the XPath query, the following cryptic error appears in the Javascript console:
Error: uncaught exception: [Exception... "An attempt was made to create or change an object in a way which is incorrect with regard to namespaces"  code: "14" nsresult: "0x8053000e (NS_ERROR_DOM_NAMESPACE_ERR)"  location: "sarissa_ieemu_xpath.js Line: 159"]
Namespace used in query that are auto-detected by Mozilla:
1.<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist"   xmlns:tei="http://www.tei-c.org/ns/1.0">
Searching for "/exist:result/tei:test" correctly returns the single node "tei:test". However, in the following case, the very same query will result in an error on the console.
1.<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">
2.<tei:test xmlns:tei="http://www.tei-c.org/ns/1.0"></tei:test>
An XPath query on the above XML that uses namespace "tei" results in cryptic error NS_ERROR_DOM_NAMESPACE_ERR.
Turns out, Mozilla provides similar functionality to IE's "SelectionNamespaces" property, and this is wrapped in Sarissa.setXpathNamespaces() and works in a similar manner as IE. However, the Sarissa documentation seems to incorrectly indicate that it is unnecessary except when the "document features a default namespace", because "moz will rezolve non-default namespaces by itself". Using this function to explicitly define the namespaces fixes the error and returns the result of the selection.

I therefore modified the Eric Bessette's GWT wrapper library to take an string of namespaces to be used, and use the setXpathNamespaces() function (which seems to cover for IE's "SelectionNamespaces" property as well).

Alternative Methods of Selection
Aside from using XPath, there are other selection/querying methods I considered.

While researching selection solutions, I considered using GWTQuery. GWTQuery is a neat library which mimics jQuery syntax for selection. Incidentally, it uses XPath where available. Unfortanately it is made for HTML and doesn't seem to support XML too well.

I also considered converting the XML to JsonML. There are two forms of JsonML ( http://en.wikipedia.org/wiki/JsonML#Syntaxhttp://tech.groups.yahoo.com/group/json/message/1115 ): Array Form and the Object Form. Unfortunately, I found that selection in Array Form is still tedious, as the first two elements might be the tag name and a dictionary of the attributes, but, if there are no attributes, then the second element is already a node. This makes iterator through nodes just as tedious as using the DOM! At the very least, keep an empty dictionary for the attributes, so that I don't have to test every node to see if the second array element is a dictionary ( and thus an attribute ) or an element. Object Form is more explicit, keeping the attributes as a dictionary, and reserving the attributes "tagName" and "childNodes" to contain the name of the node, and an array of the child nodes respectively. In my opinion, a better solution would be that each node should be a dictionary, a tagName entry containing the name of the node, an attribute entry containing a dictionary of attributes, and a childNodes entry, containing an array of children. An example:

06."childNodes": [
07.{"tagName""firstName""attributes":{}, "childNodes" : ["Robert"]},
08.{"tagName""lastName""attributes":{}, "childNodes" : ["Smith"]},
09.{"tagName""address""attributes":{"type":"home"}, "childNodes" : [
10.{"tagName""street""attributes":{}, "childNodes" : ["12345 Sixth Ave"]},
11.{"tagName""city""attributes":{}, "childNodes" : ["Anytown"]},
12.{"tagName""state""attributes":{}, "childNodes" : ["CA"]},
13.{"tagName""postalCode""attributes":{}, "childNodes" : ["98765-4321"]},
JsonML has the added advantage that all of the processing can be done on the server side (using the handy XSLT transform provided), and I might yet switch to using a modified version of it in the future. It is still good to have XPath support in GWT through Sarissa though. Though Sarissa seems not to be active, it uses feature sniffing, and should thus continue to work in any browser that remains backwards compatible. If GWTQuery gets better support for XML, I would probably use it in conjunction with both of them.