Tuesday, August 03, 2004

Federal XML Work Group, July 21st meeting


Technoflak arrived after the break and so missed Steve Hanby’s presentation on EAI Components in Relation to eGov, LOBs & FEA.

After a brief introduction from Bently Roberts, Clerk of the Merit Systems Protection Board(MSPB), Tim Korb, Manager of Information Services, opened the second session with a description of the MSPB’s requirements for XML technology and the potential for document storage and retrieval.

Korb began by describing the work of the Merit Systems Protection Board. It is part of what used to be the Civil Service, and its mission is to protect the merit system in government employment. In addition to generating studies it adjudicates disputes. Its adjudication system acts as a court, receiving many filings and sending out notices and decisions in field offices across the country. Korb said, ”We create a lot of electronic documents, unfortunately not as accessible as they might be.” He lamented ”limited search capabilities and even where good, different from one silo to another.” The MSPB must cope with different applications in different repositories and can vary from one format to another and one version to another.

The MSPB was looking for a solution that would be vendor neutral and would allow for accessibility into the future. Already there are difficulties accessing documents in older versions of Word. Korb asked, ”How are we going to get what we have now into XML?” The MSPB needed a solution that would convert virtually any common format into XML and present the look of the original document. They also needed the capability to use the metadata of the legacy documents and the ability to search the documents. They would need the capability to take data from XML and put it into other formats. The MSPB needed to leave the old versions of documents undisturbed, keep old data bases intact and synchronize with the XML data base. Korb explained that the case management system is a relational data base. Here Owen Ambur, chair of the XML Work Group, asked if the MSPB is participating in the Office of Management and Budget’s effort to standardize case management. Korb answered, ”Not at this time.”

Korb explained that unlike the Federal Courts, which require electronic documents to be submitted in PDF format, the MSPB permits electronic filings to be submitted in almost any format. Korb went on observe that it would be nice to have full text search on previous inquiries (FOIA, congressional requests, etc.) so that MSPB would know how they had previously handled such an inquiry.

Owen Ambur said that the Federal Courts had wanted PDF-A used as the standard, and he wanted to give Adobe credit for building XML capability into their products.

Ambur observed that if we are to have a citizen-centered government, then we must permit the citizens to file documents in whatever format they prefer. If government makes it difficult to for a citizen to do business with it, then it is not doing its job, at least not in a democracy.

Here Technoflak observed that PDF format does not show a document’s history of revisions. Word and other formats do permit this, as Tony Blair knows to his cost. Ambur expanded on this, saying that IT people do not always understand the business requirement for preserving the integrity of the document, what lawyers refer to as the ”four corners of the document.” It is critical for both legal and archival reasons that documents look the same tomorrow as they do today.

Here Tim Korb introduced Ipedo’s team which was led by Kam Thakker, with Chetan Patel and Alex Chang assisting. They described their system as an XML intelligent platform. Ipedo delivers enterprise-information-integration by using XML to integrate and manage information from disparate complex data sources.

Ipedo’s software creates a ”virtual data layer” by taking information from a data store, creating a virtual XML file, analyzing the data and assembling it to put it into the appropriate form.

The system handles relational data bases, documents and messages. It can retrieve structured, semi-structured and even unstructured electronic data. Ipedo can grab live data from public and private web pages and even messages if necessary. Security and encryption are built in so that it can be used outside the firewall.

The application is format and schema independent. It fits with existing developer tools, .Net or Java, and is platform agnostic.

Thakker talked about the evolution of this kind of software from Extract/Transform/Load (data warehousing, bulk loading and historical analysis) to Enterprise Application Integration (scheduled data movement between applications) to Enterprise Information Integration (on-demand data extraction and combination of disparate data sources). Examples of government applications for Ipedo’s technology include intelligence gathering, searching across different data bases, digital asset management, financial reporting (combining financial data from multiple systems) and portal integration.

Ipedo features include XML venues (JDBC and Web Services), XQuery builder, XML rules processing (semantic checks), and vertical accelerants (prebuilt modules and schemas).

The XQuery implementation conforms to W3C XQuery 1.0 draft. Ipedo is an active member of the XQJ expert group (JSR225 - Java API for XQuery). Their XQuery system has strong support for XML Namespaces and has full text search capability.

The Ipedo XML schema management uses both DTDs and W3C schema.

Here, Owen Ambur asked if Ipedo had Web Distributed Authoring Versioning. Thakker responded that it would be in the next release.

Thakker began a demonstration of Ipedo’s work for the Merit Systems Protection Board with a sample search query -

whistle*

and pulled up search results by case number and document number, using both Word and HTML versions.

Thakker noted Ipedo had browse capability in Word format. He pointed out that Ipedo took a ”blob of unstructured data” and made it semi-structured.

Ipedo uses SVG loaded as XML data to get the document preservation right. (The demonstration showed a Word document presented in SVG format.)

Ipedo uses tagging in Word and XML XSLT style sheet transformation. This permits on demand views and transformation.

Thakker was asked about Ipedo’s two day turnaround. He replied that the MSPB had provided a series of Word documents, which were converted into XML and then imported into Ipedo’s system. Thakker explained that the Word documents already had the metadata. Here Korb said he was proud that the MSPB had been using metadata for a while, and he had had something to do with that. Thakker went on to say that Ipedo queries go into multiple sources. The query default is ”or”, so putting ”and” will bring up additional documents.

Minutes of the July 21st meeting

Next meeting will be in August 18th.

No comments: