
ECS Feature RoadmapRecent AdditionsSupport for new structures'Dot Leader' Support - In Tables of Contents and other list-like structures, the formatting often includes lines of dots that help the reader by leading the eye across the page. These 'Dot Leaders' are an artifact of publishing, and, during conversion, usually it is desirable that they be identified as such. This new feature makes it much easier to deal with Dot Leaders in any number of ways, as required by the output specifications associated with an individual conversion project, and uses identified Dot Leaders as indicators of the position of table column separators. Production management enhancementsPerformance of the ECS Engine - The ECS Engine already has the best performance relative to configuration in the market today; improvements in performance extend this lead. Enhanced control over exclusions - During the conversion process, there may be insignificant content interfering with content object identification. Enhanced tools allow for specific types of data to be excluded in specific situations, meaning less effort is required to remove them in post-conversion processes. Text Attribute properties displayed - In performing quality assurance or developing an overall conversion process for a given document set, it is important to have a robust and complete set of analysis tools at your fingertips. The type and method by which properties are displayed to operators in the ECS Inspector has been improved and clarified. Improved support for complex structuresThe Engine now distinguished background for type from other backgrounds - When text has a background color or shade, that is sometimes a significant distinction for that passage, identifying it in a certain way. Post-processing options are therefore increased, and automatic recognition of sidebars is improved. Identification of Inline Titles - Exegenix has enhanced section title recognition to include those titles that form part of the first line of the first paragraph in a subsection. Known as "Inline" or "Run-In" titles, they are marked as such, and contribute to the overall document hierarchy, as needed to meet the output requirements. Improved Table Column Recognition - Tables continue to be one of the most challenging aspects of conversion, and Exegenix has improved it's already substantial lead with increased accuracy in identification of table column separators, both visible, and those implied by whitespace gutters. Support for additional color encoding techniques - There are a number of techniques for encoding color within an electronic file. Recent extensions of support for various such techniques provides increased post-processing capabilities. Better identification of rectangles made of complex multi-line segments - To the human eye, a number of lines can appear as a box, but a computer may draw those lines in any number of ways that may result in incorrect identification of a box as just being a four sided box. Improvements have been made in correctly determining the nature of such complex line drawings. Improved Crop mark support - Pages destined for a printing press, to be bound as books, may have cut marks indicating where the page should be trimmed. When an electronic version of such a document is converted, these cut marks are to be discarded. Improvements have been made in determining that such marks indicate the page boundary, and therefore, important content is correctly distinguished from these irrelevant marks. Better identification of typeface formatting based on typeface name - Typeface vendors use a number of techniques to indicate that text is bold, italic or special in some way. Improvements to recognition of these significant indicators means that typefaces are more very rarely incorrectly identified as being bold or italic. Improved superscript and subscript recognition - The means to move a character into the position of "superscript" or "subscript" within documents vary greatly. Improvements to recognition of such formatting ensures more accurate, higher quality output. Improvements in the recognition of background filled boxes - Document designers often use boxes of colour to liven up the presentation of the important content, but these design elements are not important content in and of themselves. Improvements have been made to identifying such background objects, and marking them as such. Improvements in determination of underline and strikethrough formatting - When creating printed versions of documents, publishing systems may use a number of techniques to draw the lines used to indicate underscores or strikethrough text. Improvements in support of these diverse techniques result in better retention of text formatting as the author intended, regardless of the publishing system used. Improvements in mark recognition and resulting lists and sections - Marks are used to indicate the start of new list items or sections. Improvements to mark identification result in more accurate identification of the overall document hierarchy. Improvements in positioning of floating object anchors - Objects that are related to the main flow of text, but not a direct part of that flow, may float and be positioned between paragraphs or at the start or end of a subsection. Improvements have been made to the interpretation of the document and determining where an object's ideal position should be. In The PipelineSupport for new structuresFront-matter metadata - Documents that have regions containing metadata, such as author name, publication date, etc. can be tagged as such directly in the ECS Inspector, rather than having performing such tagging as part of a post-conversion tagging task. Admonishment objects - Caution, Warning, Note, and other type of special content will be tagged as such directly in the ECS Inspector. This feature extends ECS's industry-best output transformability options. Production management enhancementsConversion project requirements/specifications toolset - For every new dataset being converted, the Specification toolset will allow conversion analysts to create clear and unambiguous visual specification documentation. For many classes of structures, the default behaviour is acceptable, but in special cases, these specs will provide uniform and clear directions to conversion and quality assurance operators, and to customers of conversion services. Ongoing improvements to support problem PDF files - The world of PDF and PostScript is broad and new PDF generators are being released all the time, with varying degrees of technical compliance to the PDF / PostScript specification. Exegenix monitors the industry closely, and enhances our technology to better tolerate PDF documents that, to greater or lesser extent, violate established PDF standards. New feed-level hinting - When the documents in a dataset share distinguishing characteristics, these feed-level hints will allow an operator to set guidelines in a single document that can be applied across an entire dataset, decreasing quality assurance workload, while still leaving full control at the hands of the operator. * Inspector Pause facility - For ongoing performance analysis and conversion project management, when a document analysis session goes idle, a log entry will be made, making high-level productivity analysis more accurate. Character de-obfuscation solver - Conversion industry professionals are aware of the difficulties surrounding typeface and character names in PDF and PostScript files being masked, or 'obfuscated'. Exegenix continues to lead the industry in resolving this technically challenging issue, and has now provided a facility to overcome such character ambiguities once and for all. Improved support for complex structuresFigure content model enhancements - Grouping objects labelled as "Figures" do not exclusively contain pictures or diagrams. In cases where a "Figure" refers to a combination of text block or table or other type of non-graphical object, the Exegenix Conversion Engine will establish the type of content contained with in the Figure group, and create XML in consideration of those discoveries. Support for Index content - Index content often has special requirements in a conversion process. This content is currently supported in Exegenix Conversion Technology in a general way, but specialised support for Index structures is forthcoming. Automated Literal Layout identification - Within a given range of the document, there can be text blocks where the exact positioning of the characters relative to one another must be retained. These areas will be identified automatically, where previously, the affected page area had to be provided as an instruction from a human operator. Automated Mathematics identification - Where complex display math equations were previously recognised as regular graphics, but could be indicated by an operator as being of the type "equation", the ECS Engine is now able to automatically identify such a graphic as an equation, and create the "equation" object structures accordingly, and identify inline mathematics as well. |
Submit sample documents More Info |