
The Exegenix Export DTDDesign goals
Based on DocBookThe Exegenix Export DTD employs a structure model, element and attribute names, and table model which have been chosen to conform to those commonly used in the industry, particularly those found in the DocBook DTD (http://www.docbook.org), with some augmentations that provide for:
TransformabilityThe DTD has been designed to facilitate transformation of the XML output using tools such as XSLT. The principal way in which this has been done is by providing container elements that give scripts easy access to groups of related elements. For example:
The similarity of our markup to industry-"standard" markup also facilitates transformation. Also, redundant block-level repetition of formatting information, rather than using inheritance rules, makes script development easier. Block structureWe have opted for a very flexible block structure, where most block elements can contain any other block element type (including itself). When authoring documents, a DTD having a rigid content model prevents incorrect usage of elements by authors using validating authoring tools (a prescriptive DTD design). We feel a more flexible approach is required for a descriptive DTD which is intended to model an extremely wide variety of documents, and not enforce particular authoring rules. It also makes the DTD simpler to write and understand. HierarchyThe DTD can represent a hierarchy with any number of levels. The entire document is wrapped in a <document> element. Subsequent hierarchical divisions (including "chapter") are represented by the <section> element, which can be nested infinitely. The different levels can be distinguished using the optional level attribute, or simply by computing the number of <section> ancestors of a particular <section>. Sections have an optional title (<title>), and can store section-level headers and footers. The remaining contents of a section are contained in a <sectionbody> element. Paragraph structureThe <para> element represents a paragraph, which by its broadest definition encompasses a sequence of thematically linked blocks: for example a block of text, which introduces a list, continues after the list terminates, and later references a block quote, could all be surrounded by the same <para> tag. For this reason, ordinary text inside a <para> is surrounded by a <block> tag, in order to avoid mixed block and inline content. Furthermore, contiguous blocks of text that exhibit formatting differences (for example, lines may have a shorter length as they wrap around an image) can each be represented by a <fragment>. TitlesTitles can contain any block content; in practice, we expect that most titles will be marked up as one or more <block> elements, each representing contiguous lines of title text, where each physical line is contained in a <line> element. A title can contain one or more blocks. ListsThe DTD supports ordered (<orderedlist>), unordered <itemizedlist>, and compound <compoundlist> lists. Compound lists are a generalization of the HTML "definition list", and consist of lists whose "term" item can have an arbitrary number of sibling "definition" items, and an arbitrary number of sub-items. For ordered and unordered lists, list marks (bullets, numbers) are represented by a distinct <mark> element whose content is the mark itself. Other blocksThe DTD supports the following other block constructs: Block Quote; Literal Layout; Note; Equation; Side Bar. Table modelTo represent tables we use the CALS model with some augmentations. All table models (including HTML) have a table-row-cell structure. Our tables will be easily transformable to clients' preferred models (when not CALS). To the CALS model we have added some HTML attributes such as cellspacing and cellpadding, as well as a richer set of separators. Inline EmphasisMost forms of inline emphasis are represented by the <emphasis> element. Individual styles are distinguished by the values of the relevant formatting property attributes. For example, font-weight="bold" represents bold emphasis. Following common industry practice, subscripts and superscripts are represented by the specific elements <subscript> and <superscript>. FormattingFormatting properties are represented in attributes of the various objects themselves. These attributes were adopted from CSS and XSL-FO: for example, font-weight, font-style. We felt that storing this information in the same file as the document content and structure would make the XML output easier to work with than storing it in a separate file and linking it to elements in the main file by way of IDs or some other mechanism. PaginationPage breaks are represented by the <beginpage> element. This element records the ordinal page number in the source document. It can also contain page-specific headers and footers, and stores any footnotes that appear on the page. |
Submit sample documents More Info
|