
ECS EngineFour Automated StepsThe ECS Engine is an extensible, modular technology that can be adapted to meet the content conversion needs of any organization. All content goes through four processes in order to uncover the documents structure and generate valid XML that can then be transformed to meet specific customer needs.
Visual Data AcquisitionThe first step in the process analyzes a documents PostScript or PDF representation to extract all information about the appearance of the document. This includes the characters in the document and their typography, and any other visual materials. Visual TokenizationThe Visual Tokenization Phase identifies the basic building blocks of document structure, including many important visual cues, and the large-scale layout areas of the page. Structure IdentificationThe Structure Identification Phase places these basic building blocks into a tree structure. This phase identifies sections, paragraphs, quotes, lists, tables, footnotes, and more, and forms a complete, cohesive, internal representation of the structured document. XML GenerationThis phase uses the internal representation of the document to export an XML file that not only presents the document's content and logical structure but also retains all relevant formatting information. The XML file is designed for ease of use in XSL Transformation scripts. Beyond XMLThe Exegenix approach to uncovering document structure has applications beyond the conversion of documents to XML. Exegenix technology can be integrated with both traditional and XML-enabled applications. In fact, any application that can benefit from understanding the structure of a document, such as natural language parsing applications or search engines, can benefit from Exegenix technology. |