Tuesday, March 27, 2007

Using XQuery to process and integrate XML, relational, web services, and legacy data

Carlo Innocenti (Minollo), XML Technologies Program Manager for DataDirect, spoke to the March meeting of the DC XML Users Group. He began with some general remarks about XQuery, which is a standard for querying XML that was designed keeping data integration problems in mind.

Minollo offered an example of a data integration problem: a user wishes to request a report of their stock portfolio. Typically, this would involve a request from the user’s desktop to some combination of a legacy database management system, relational database, and some public web services resource with the current stock price. Users prefer web browsers, so the report must be rendered in HTML. Generally, the solution would involve accessing web services via a SOAP message through AXIS. Developers would probably use Java code to generate the HTML report.

Minollo said this could lead to a dangerous approach, multiple data consumers accessing multiple data sources for information:

AJAX Client dynamic HTML Web Services Publishing App's. REST

_____________________________________________________
Data Access Layer
_____________________________________________________

RDBM XML Documents EDI Messages Web Services


This model would quickly create a tangled net of connections. The XQuery vision is simpler, an XML layer above an XQuery system accessing the data sources.

Here, Minollo asked the audience how many had looked at XQuery for “more than two minutes?” - a few hands went up. He asked how many had seen XQuery for two minutes - a few more hands went up. He asked about XSLT and almost everyone’s hand went up.

XQuery was created by a group of vendors. It is the W3C query language for XML. Minollo characterized it as “the SQL for XML.” It has a high level of function; it can find anything in an XML structure and can combine data from multiple data sources.

Here, Minollo began to walk through some code examples, beginning with:
doc(‘holdings.xml’)holdings/entry

He observed that this was a flwor expression: query and combining data. Such queries could be looped through multiple data sources, and such expressions could be nested in a long series.

Minollo further indicated that with XQuery you can build XML, use and define functions. Functions may be pulled together and imported.

He emphasized that, unlike conventional programming languages, XQuery had native support for XML. No more parse, navigate, and cast, because in XQuery XML is native. Furthermore, nothing in XQuery prevents you from using it with non-XML data; XQuery is designed for data integration.

Minollo pointed out that XML output is very handy. Increasingly it is the industry standard for data exchange. The use of XQuery greatly increases programmer productivity, as there is far less code to write.

He said that with XQuery you are dependent upon implementation. In response to my email requesting clarification of this, Minollo said, “What I said (or meant to say) is that support for heterogeneous data sources in XQuery depends strongly on implementations; most implementations just support querying XML only; others support querying only XML types in the database (typically the RDBMS implementations); others support plenty of data sources, like our DataDirect XQuery. XQuery as a language doesn't specify what data sources must be supported.

About speed, what I meant is that how an XQuery processor performs against a specific data source is again strongly implementation dependent. There are generic XQuery optimizations that can be done to improve the processor's speed and scalability (pretty much like for XSLT or other languages); such optimizations can even reach the point of creating hardware "XQuery" or "XSLT" CPUs.

But those approaches have limits when it comes to dealing with specific data sources; and RDBMS support is a typical example of that. You can optimize "pure XQuery processing" as much as you want, but the only approach to make XQuery perform and scale against RDBMS data is to translate XQuery into SQL where relevant. That's why different XQuery engines may lead to such dramatic performance and scalability changes.”


Minollo explained that XQJ is an XQuery API for Java; it is a standard, similar to JDBC. XQJ permits XQuery to fit into any Java architecture. Results can be retrieved as DOM, SAX, StAX, or text.

XQuery resides in the data access layer. XQJ interfaces with the consumers of data, while the XQuery engine accesses the data sources.

Here, Minollo began to explain DataDirect’s XQuery product. It has the ability to perform “streaming inquiries” on large documents, tossing the irrelevant information and retrieving only what you are looking for. For example, you might have an alphabetical list of orders in which you are only interested in one particular order. DataDirect’s XQuery will stream the search and retrieve only that order.

Minollo emphasized that DataDirect’s XQuery was all standards based; there is nothing new or unique to DataDirect that programmers have to learn. It is a component and not server dependent. It can convert non-XML data to XML format and can do so on the fly, including:
EDI message types
comma delimited or tab delimited
dBase
RTF
mbox
batch conversions are supported
custom conversions are supported

A member of the audience asked if DataDirect’s XQuery could insert, delete, or update information. Minollo replied that it was a good question and that DataDirect has a working draft with this function.

Minollo returned to the example of retrieving a stock portfolio saying, “Back to our problem, remember? It’s a nightmare, you can’t forget. I have here a random tool...” Here there was general laughter as it was obviously carefully selected for his presentation. He walked through a long series of code examples demonstrating how DataDirect’s XQuery works.

Resources:
www.xQuery.com
www.xmlconverters.com

No comments: