HOMECONTACT US

ECS Engine

Four Automated Steps

The ECS Engine is an extensible, modular technology that can be adapted to meet the content conversion needs of any organization. All content goes through four processes in order to uncover the document’s structure and generate valid XML that can then be transformed to meet specific customer needs.

Exegenix technology processes content in four stages. Visual Data Acquisition identifies the characters and lines on the page. Visual Tokenization determines the boundaries of each object. Structure Identification determines the document’s structure based on the nature of the objects and their relationships. Finally, XML Generation exports an XML file that can be easily transformed using XSL.

Visual Data Acquisition

The first step in the process analyzes a document’s PostScript™ or PDF representation to extract all information about the appearance of the document. This includes the characters in the document and their typography, and any other visual materials.

Visual Tokenization

The Visual Tokenization Phase identifies the basic building blocks of document structure, including many important visual cues, and the large-scale layout areas of the page.

Structure Identification

The Structure Identification Phase places these basic building blocks into a tree structure. This phase identifies sections, paragraphs, quotes, lists, tables, footnotes, and more, and forms a complete, cohesive, internal representation of the structured document.

XML Generation

This phase uses the internal representation of the document to export an XML file that not only presents the document's content and logical structure but also retains all relevant formatting information. The XML file is designed for ease of use in XSL Transformation scripts.

Beyond XML

The Exegenix approach to uncovering document structure has applications beyond the conversion of documents to XML. Exegenix technology can be integrated with both traditional and XML-enabled applications. In fact, any application that can benefit from understanding the structure of a document, such as natural language parsing applications or search engines, can benefit from Exegenix technology.