Mongo XML

XML Storage: each XML document is parsed only once! Its content in our optimzed binary format XDM is stored in a MongoDB document as a blob (XDM). The binary encoded XML is acompanied by metadata, some of which (meta2) can be extracted from the document by compiled XSLT/XPath.

Input

The metadata (JSON) is used to index individual XML documents to speed up subsequent retrieval and sorting. Otherwise the binary XDM blobs are opaque to Mongo.

Output

XML retrieval: a normal MongoDB query will refer to the metadata portion of stored documents ideally returning only the needed content. Retrieved XDM blobs can now be transformed into output (eg. XHTML) by XSLT/XPath compiled into Java bytecodes w/o the need to parse XML!

The architecture has been optimized for highest possible thoughput and minimal latencies; in a recent study we achieved under millisecond processing times per XML document: end-to-end, from Mongo to final result.

An XML full-text search engine finds specific passages of text where query keywords co-occur. It returns 'query hits' ranked by quality. The query hits refer to binary XML in MongoDB and contain Xpath expressions locating the found paragraphs and words to highlight.