RATFINK

A library of RTF output utilities for Tcl
Version 0.9

Joe English
Last updated: Sunday 27 June 1999, 15:31 PDT



1 Introduction



RTF, or Rich Text Format, is Microsoft's interchange format for word processing files. RATFINK is a set of Tcl routines for creating RTF files, and a Cost interface for converting SGML to RTF.

RTF is also the basis for Windows Help (WINHELP) format. RATFINK does not currently contain any direct support for WINHELP.

rtflib.tcl contains the low-level utility routines. This file is not Cost-specific and may be used in any Tcl script. RTF.spec contains the extra Cost commands for SGML conversion.

1.1 Overview

To use RATFINK with Cost, create a translation script that does the following:

  1. Load the library code with the command require RTF.spec
  2. Define the style sheet entries using rtf:paraStyle and the other commands described in 2. ``Declaration commands''
  3. Define a Cost specification mapping each element type in the source DTD to one of the RATFINK processing forms as described in 4. ``SGML Conversion''
  4. Call rtf:start to begin RTF output
  5. Call rtf:convert to process the document
  6. Call rtf:end to finish RTF output.

NOTE -- The last three steps may be done in the main procedure.

Then run

sgmls sgmldecl yourdoc.sgml | costsh -S yourscript.spec > output.rtf
to convert yourdoc.sgml to RTF.

A short example

# Define stylesheet:
rtf:paraStyle body "Body Text" {
    Font Roman
    FontSize 10pt
    LeftIndent 0.5in
}
rtf:paraStyle heading "Heading 1" {
    Font Sans
    FontSize 14pt
    Bold 1
    LeftIndent 0pt
}

# Specify processing for each element type:
specification rtfSpec {
  {element P} {
    rtf para
    paraStyle body
  }
  {element H1} {
    rtf para
    paraStyle heading
  }
}

# Main routine:
proc main {} {
    rtf:start
    rtf:convert rtfSpec
    rtf:end
}

1.2 A brief description of RTF

Lexically, RTF files are a simple stream of text, control words, and groups, All text is seven-bit ASCII. Control words are lowercase alphabetic tokens beginning with a backslash, followed by an optional integer parameter, and terminated by a space or a non-alphanumeric character. Groups are nested data enclosed in curly braces.

Semantically, things start to get complicated.

RTF files start with a header, which contains a font table, color table, stylesheet, and other metainformation. The header is followed by the document data. Everything in the document is a paragraph, except for the stuff that isn't. The stuff that isn't a paragraph includes

Actually it's all very messy and you shouldn't have to worry about it too much except to note that every block of displayed data is considered a paragraph, and that RTF has a ``flat'' structure: sections, paragraphs, and table rows do not nest.

NOTE -- Groups on the other hand do nest, and group boundaries can cross section, paragraph, row, and cell boundaries, but don't worry about that either.

2 Declaration commands



2.1 Stylesheet entries



Formatting is determined by stylesheet entries. There are three types of stylesheet entries in RTF: character styles, paragraph styles, and section styles. These are defined with the commands rtf:charStyle, rtf:paraStyle, and rtf:sectStyle, respectively All three commands have the same syntax:

rtf:xxxStyle id "description"  [ -basedon styleid ] {
    param value
    param value
    ...
}

Each stylesheet entry has a symbolic id by which it is referenced in other commands, and a textual description, which is inserted into the RTF file as the style name.

TIP -- Some RTF style names are interpreted specially by various word processors; see 2.6. ``Special style names'', below.

If -basedon is specified, then all of the style properties listed in the named styleid are copied to the style being defined. Parameters in the new style override those in the base style. styleid must be the id of a previously-defined style of the same type (paragraph, character, or section).

Style properties are specified as a list of param-value pairs. Parameters map (more or less) onto RTF control words. Parameter values take one of several forms, depending on the parameter. Boolean parameters specify true/false values; the value must be 1 or 0. Flag parameters are like booleans, but the parameter can only be turned on (flag parameters correspond to properties which are off by default and turned on by the presence of an RTF control word.) Dimension parameters specify a length as a number followed by a unit name, e.g., 12pt, 0.5in; see 2.1.1. ``Dimensions'' for more details. Other parameters may be integers, a list of enumerated values, or some other type as described below.

2.1.1 Dimensions

Most RTF control words expect lengths to be specified in twips; others expect them to be specified in half-points. There are 20 twips to a point, and there are 72 points to an inch.

NOTE -- There are several definitions of a ``point''; RTF uses the conventional DTP definition of exactly 72 points to the inch.

Since twips and half-points are not a very convenient way to specify lengths, RATFINK allows you to specify dimensions in terms of other units and converts to twips or half-points as appropriate.

Dimension specifications are decimal numbers, with an optional minus sign and fractional part, followed immediately by one of the following units:

in
inches
pt
points (1 point = 72 inches)
pc
pica
picas (6 picas = 1 inch)
twip
1/20th point
mm
millimeters (but see note)
cm
centimers (but see note)

NOTE -- Due to roundoff errors in converting millimeters to twips, the metric units are unreliable. Also, different versions of Word seem to use different rounding conventions. It's best to avoid cm and mm if possible.

For example,

rtf:paraStyle body "Body Text" {
    FontSize  10pt
    LineSpacing  12pt
    LeftMargin  1in
    FirstIndent  0.5in
}

The rtf:cvTwips and rtf:cvHalfpts functions convert dimension specifications to twips and half-points, respectively.

rtf:cvTwips dimension

Returns the value of dimension in twips, rounded to the nearest integer.

rtf:cvHalfpts dimension

The same as rtf:cvTwips, but returns the value in half-points.

2.1.2 Fonts

NOTE -- At present, the available fonts are hard-coded and may not be changed.

The available fonts are:

roman
The default font; currently Times New Roman
sans
A sans-serif font; currently Arial
mono
A monospaced font; currently Courier New

2.1.3 Tab stops

Every paragraph style may include its own set of tab stops. This is specified by the TabStops paragraph style property.

Tab stops are specified as a list of tabspecs; each tabspec is a list consisting of a dimension followed by any of the following property specifications:

Align
Specifies how the text following the tab is to be aligned with respect to this tab stop. One of: Left, Right, Center, or Decimal.
Leaders
One of:
Dot
Leader dots or periods
Thick
Leader thick line
Underscore
Leader underline
Equal
Leader equal sign (=)
Hyphen
Leader hyphens (-)

Tab stops may also be defined with the rtf:tabStops command:

rtf:tabStops name { tabspec ... }

Defines and assigns to name a set of tab stops. name may be referenced by a TabStops parameter in subsequent paragraph style definitions.

For example,

rtf:tabStops normaltabs {
    80pt 160pt 240pt 320pt 400pt 480pt 560pt 640pt 720pt
}
rtf:tabStops threepart {
	"3in Align Center"
	"6in Align Right"
}
rtf:paraStyle header "Page header" {
    TabStops threepart
}
rtf:tabStops toctabs { "6in  Align Left  Leaders Dot" }

The TabStops paragraph style property may not be overridden: tab stops in a paragraph style are added to those in the base paragraph style.

NOTE -- The RATFINK syntax for tab stops is messy and counterintuitive, and will probably change...

2.1.4 Rules and borders

RATFINK predefines the rule styles thin, thick, and double. Additional rule and border styles may be defined with the rtf:ruleStyle command. Like rtf:tabStops, this defines a symbolic name for a particular rule style that is referenced as the value of other stylesheet parameters.

rtf:ruleStyle name {
    param value
    ...
}

Defines a rule style named name. Allowable parameters are:

Style
One of the symbols Normal, Thick, Double, Shadow, Dash, Dot, or Hairline. Thick specifies a double-thickness border; Normal is the default.
Thickness
Dimension: thickness of the rule
Margin
Dimension: space to leave between the rule and the text to which the rule is attached.

NOTE -- It is unclear what the \brdrhair (Style Hairline) control word means, since the thickness of the rule is actually determined by the \brdrw (Thickness) control word.

RATFINK makes sure that all the control words are output in the correct order.

2.2 Character styles

Character styles are used for regions of text within a paragraph. All of the character style parameters are also valid paragraph style parameters.

rtf:charStyle id "description" -basedon styleid {
    param value
    ...
}

Available parameters are:

Font
One of the symbolic font names defined in the font table; e.g., roman, sans, or mono. See 2.1.2. ``Fonts''.
FontSize
Dimension: height of the font. If no units are specified, defaults to half-points. Does not include leading.
Bold
Boolean: if set, use bold font variant
Italic
Boolean: if set, use italic font variant.
AllCaps
Boolean: if set, folds all text to uppercase
SmallCaps
Boolean: if set, folds lowercase letters to small caps.
Hidden
Boolean: hidden text.
Underline
One of:
0
None
No underline
1
Single
Single (continuous) underline
Double
Double underline
Dot
Dotted underline
Word
Underline words only (single underline)

NOTE -- The ``shadow'' and ``outline'' character formatting properties have been omitted on aesthetic grounds. Considering how Word typically renders ``small caps,'' that should probably be avoided as well.

NOTE -- Apparently it is not possible in RTF to specify double word underline.

2.3 Paragraph styles

rtf:paraStyle id "description" -basedon styleid {
    param value
    ...
}

Defines a paragraph style, which may be referenced by id in a later call to rtf:startPara.

Available parameters are:

LeftIndent
Dimension: Left margin, relative to page margin
RightIndent
Dimension: Right margin, relative to page margin
FirstIndent
Dimension: Indentation of first line, relative to LeftIndent.
Quadding
One of the symbols
Left
Flush left, ragged right
Right
Flush right, ragged left
Center
Each line centered
Justify
Both margins aligned
SpaceBefore
Dimension: vertical whitespace inserted before the paragraph
SpaceAfter
Dimension: vertical whitespace inserted after the paragraph
LineSpacing
Dimension: minimum height of each line (including leading). If omitted or set to zero, the font height is used.
TabStops
Reference to a set of tab stops previously defined with rtf:tabStops (see 2.1.3. ``Tab stops'').
Hyphenate
Boolean: enable (1) or disable (0) hyphenation for this paragraph.
PageBreakBefore
Force a page break before this paragraph
KeepTogether
Flag: disallow page breaks within this paragraph
KeepWithNext
Flag: disallow page break between this paragraph and the next one.
TopBorder
BottomBorder
LeftBorder
RightBorder
Argument is a rule style as described in 2.1.4. ``Rules and borders''. Specifies that a rule is to be drawn above, below, to the left, or to the right of the paragraph.
InnerBorders
Flag: border specifications apply to individual paragraphs.

In addition, all of the character formatting style attributes (see 2.2. ``Character styles'') may be specified for a paragraph style.

TIP -- To achieve ``hanging indentation'', use a negative value for FirstIndent.

NOTE -- If a series of successive paragraphs specify the same set of borders, the borders are drawn around the group as a whole unless the InnerBorders flag is specified.

2.4 Section styles

rtf:sectStyle id "description" -basedon styleid {
    param value
    ...
}

Defines a section style, which may be referenced by id in a later call to rtf:startSection.

PageWidth
PageHeight
Dimensions: width and height of the page for this section. Default taken from document formatting properties.
LeftMargin
RightMargin
TopMargin
BottomMargin
Dimension: left, right, top, and bottom page margins. Default taken from document formatting properties.
Landscape
Flag: if set, page orientation is landscape instead of portrait for this section.

NOTE -- It is unclear how this affects the interpretation of the page size and margin control words.

HasTitlePage
Flag: if set, the first page of this section may use a different header and footer than the rest of the section. See 3.6. ``Page headers and footers''.
HeaderPosition
Dimension: Distance from the top of the page to the page header.
FooterPosition
Dimension: Distance from the botom of the page to the page footer.

NOTE -- It is unclear whether these specify distance to the top or the bottom of the header and footer.

SectionBreak
One of the following symbols:
None
No explicit page break before this section
Page
Section starts on a new page
EvenPage
Section starts on the next even-numbered page
OddPage
Section starts on the next odd-numbered page
VAlign
Specifies the vertical alignment of the text with respect to the page margins. One of the following symbolic values:
Top
Ragged-bottom pages, aligned with the top margin (default).
Justify
Text is set flush-bottom.
Middle
Text is centered between the top and bottom margins.
Bottom
Text is aligned with the bottom margin.
PageNumbering
Determines the format for the PageNumber special control word (see 3.3. ``Special characters''). One of Arabic, UCRoman, LCRoman, UCAlpha, or LCAlpha, for arabic (decimal), uppercase roman numerals, lowercase roman numerals, uppercase letters, and lowercase letters, respectively.
RestartPageNumbers
Boolean: if set, page numbering starts over at this section.
FirstPageNumber
Integer: starting page number for this section if RestartPageNumbers is set (1).

2.5 Document-wide formatting properties

The rtf:documentFormat command specifies formatting properties for the document as a whole. This command is optional; the default values for these parameters should be sufficient unless you need finer control over the layout.

rtf:documentFormat {
    param value
    ...
}

Available document formatting parameters are:

PaperWidth
PaperHeight
Dimension: specify the width and height of the paper. Default depends on the output device; typically US letter (8.5in by 11in).
PaperSize
Shorthand way of specifying PaperWidth and PaperHeight. One of the symbolic values A4, A5, B5, Letter, Legal, or Executive.
LeftMargin
RightMargin
TopMargin
BottomMargin
Dimension: specifies the left, right, top, and bottom page margins. May be overridden on a per-section basis.
TwoSide
Flag: if set, specifies two-sided format. This affects headers and footers; see 3.6. ``Page headers and footers''.
MirrorMargins
Flag: if set, swaps the values of LeftMargin and RightMargin on verso (even-numbered) pages. MirrorMargins is only valid if TwoSide is set.

NOTE -- The RTF spec says only that this control word ``switches margin definitions on left and right pages,'' which is ambiguous. By experimentation, LeftMargin corresponds to the ``inner'' margin and RightMargin corresponds to the ``outer'' margin, at least in Word for Windows 95 Version 7

Landscape
Flag: if set, the entire document is set with landscape orientation.

NOTE -- It is unclear how the top, bottom, left, and right margins are interpreted if landscape orientation is specified.

Protection
Enables ``document protection'' for programs which support this feature. Allowable values are:
AllProtected
Document may not be modified.
Annotations
Document may be annotated but not modified.
Revisions
Document may be modified, but revision tracking is enabled.
Hyphenate
Boolean: enables automatic hyphenation for this document.
HyphenationHotZone
Dimension: specifies the ``hyphenation hot zone,'' the distance from the right margin in which words may be hyphenated.
HyphenationLadderCount
Integer: maximum allowable number of consecutive hyphenated lines.
HyphenateAllCaps
Boolean: allow hyphenation for words consisting of all capital letters if set.

See also 3.7. ``Footnotes''.

2.6 Special style names

Word uses the paragraph style names Heading 1, Heading 2, etc., as the source text for building a table of contents. Use these names as the description for heading entries in the text to facilitate automatic TOC generation.

Word applies the paragraph styles TOC 1, TOC 2, etc., to entries in automatically-generated tables of contents. If you include definitions for these styles in the stylesheet, Word will use them to format the table of contents.

3 RTF output commands



Call rtf:start after all declarations and before writing any output. Call rtf:end at the end of processing.

rtf:start

Begins the top-level RTF group, emits the style sheet and other header information, and sets any document-wide formatting properties specified by rtf:documentFormat.

rtf:end

Closes the top-level RTF group. Must be called at the end of processing.

3.1 Document structure

The basic unit of text in RTF is the paragraph. In RTF, a paragraph is any block of displayed text -- including section headings, list items, and table cell entries -- not necessarily a conventional paragraph.

rtf:startPara styleid
# generate paragraph text...
rtf:endPara

rtf:startPara and rtf:endPara delimit the start and end of paragraphs. styleid is the name of a paragraph style defined with rtf:paraStyle. Since paragraphs do not nest, rtf:endPara is optional.

rtf:startPhrase styleid
# ...
rtf:endPhrase

Use rtf:startPhrase and rtf:endPhrase to apply special formatting to text within a paragraph. styleid is the name of a character style defined with rtf:charStyle.

rtf:endPhrase is not optional. Phrase boundaries must not cross paragraph boundaries. (Actually RTF doesn't care if they do, but this confuses RATFINK).

RTF documents may optionally be broken into sections.

rtf:startSection styleid
rtf:endSection

styleid is a section style declared with rtf:startSection. Since sections do not nest in RTF, rtf:endSection is optional.

3.2 Text

rtf:text "text"

Writes text to the output file, escaping backslashes and braces so they are not interpreted as RTF markup.

rtf:text makes sure that the output is inside a paragraph. If not, it starts a new paragraph and issues a warning.

rtf:text also replaces sequences of two consecutive hyphens with an en-dash, three hyphens with an em-dash, two backquotes (`) with a left double quote, and two apostrophes (') with a right double quote.

rtf:insert data

Inserts data into the current paragraph verbatim, leaving backslashes and braces as-is.

rtf:write "data"

rtf:write inserts data into the output verbatim. data may contain RTF control codes.

NOTE -- Be very careful when using rtf:write to generate RTF commands directly.

3.3 Special characters

The rtf:special command inserts a special character into the output stream, ensuring that the output is currently inside a paragraph.

The global array rtfSpecial maps symbolic character names to the corresponding RTF control words.

rtf:special name
rtf:insert $rtfSpecial(name)

name is one of the following symbolic names:

Tab
``Hard'' tab
LineBreak
``Hard'' line break (carriage return)
EmDash
A wide or ``em'' dash
EnDash
A short or ``en'' dash
EmSpace
Space the width of an em dash
EnSpace
Space the width of an en dash
Bullet
Filled dot (for lists, etc.)
LSQuote
Left (opening) single quote
RSQuote
Right (closing) single quote
LDQuote
Left (opening) double quote
RDQuote
Right (closing) double quote
PageNumber
Current page number (useful in page header and footer)
SectionNumber
Current section number (not useful at all)
FootnoteNumber
Current footnote number

NOTE -- The $rtfSpecial array may also be referenced in prefix and suffix parameters in Cost specifications, for example.

3.4 Miscellaneous

rtf:tab
rtf:lineBreak
rtf:pageBreak
rtf:columnBreak

These commands generate a ``hard'' tab, line break, page break, and column break control word, respectively. rtf:tab and rtf:lineBreak may only be used inside a paragraph.

3.5 Destination groups

RTF destination groups are used for text that does not appear in the main flow; e.g., page headers or footnotes. The rtf:divert command starts a new destination group.

rtf:divert destination
# generate data for destination...
rtf:undivert

See 3.6. ``Page headers and footers'' and 3.7. ``Footnotes'' for more information.

3.6 Page headers and footers

Header and footer text is specified in destination groups. There are several different destinations related to headers and footers; which ones are applicable depend on various document and section style properties.

The Header and Footer destination groups specify the default header and footer, respectively. LeftHeader, LeftFooter, RightHeader and RightFooter specify the header and footer for left (verso) and right (recto) pages; these are only applicable if the TwoSide document formatting property is set. FirstPageHeader and FirstPageFooter specify the header and footer for the first page of the section; these are only applicable if the HasTitlePage section formatting property is specified for the section.

Headers and footers should be specified immediately after the call to rtf:startSection. If a particular applicable header or footer is not specified, then it is inherited from the previous section.

Headers and footers contain ordinary paragraph text.

The PageNumber special character may be useful in headers and footers.

3.7 Footnotes

Footnotes are generated with a Footnote destination group.

rtf:special FootnoteNumber
rtf:divert Footnote
# generate footnote text ...
rtf:undivert

Footnotes are ``anchored'' to the character that immediately precedes the destination group. Use the FootnoteNumber special character to obtain automatically-numbered footnotes.

The following document-wide formatting properties affect how footnotes are formatted; they may be specified with the rtf:documentFormat command prior to the start of output.

FootnoteNumbering
Format for automatic footnote numbers. One of Arabic, UCRoman, LCRoman, UCAlpha, or LCAlpha, for arabic (decimal), uppercase roman numerals, lowercase roman numberals, uppercase letters, and lowercase letters, respectively.
FootnoteRestart
One of the following:
AtPage
Footnote numbers restart at the beginning of each page.
Continuous
Footnotes are numbered continuously.
AtSection
Footnote numbers restart at the beginning of each section.
FootnoteLocation
Specifies that footnotes appear other than at the end of the page. Possible values:
EndOfSection
Footnotes appear at the end of the section.
EndOfDocument
Footnotes appear at the end of the document.

If FootnoteLocation is not specified, footnotes appear ath the bottom of each page.

FootnotePlacement
Specifies placement of footnotes on the page. One of:
PageBottom
Footnotes are placed at the bottom of the page.
BeneathText
Footnotes appear directly beneath the text.

NOTE -- RTF also has ``alternate'' footnotes, used to put both footnotes and endnotes in the document. RATFINK does not support alternate footnotes.

3.8 Bookmarks

RTF allows sections of text to be defined as a bookmark. It is not clear from the RTF specification what this feature does; presumably it is used by word processing software.

rtf:startBookmark name
rtf:endBookmark name

The bookmark name may be any character data. rtf:startBookmark must be followed by a matching rtf:endBookmark; bookmarks may overlap however.

3.9 Tables

NOTE -- Table support is still in beta. This interface has not been very well tested or debugged, and is subject to change.

In RTF, a table is a consecutive series of rows, each of which contains a series of cells. Cells contain either a series of one or more paragraphs or inline text.

RTF has no explicit control words for the beginning and end of a table. Instead, tables are specified as a sequence of rows. Cell properties (sizes and rules) are specified all at once at the beginning of each row, followed by the cells themselves.

RATFINK makes the following simplifying assumptions:

rtf:startTable
    ( -numcols n
      | -abswidths "w1 w2 ... wn"
      | -relwidths "w1 w2 ... wn"  )
    [ -width dimension ]
    [ -frame rulestyle ]
    [ -rowsep rulestyle ]
    [ -colsep rulestyle ]
    [ -align (Left|Right|Center) ]

# ...
rtf:endTable

rtf:startTable begins a table. Exactly one of -numcols, -relwidths, or -abswidths must be specified to define the number of columns in the table.

-numcols n
Specifies that the table has n equal-width columns.
-abswidths "w1 w2 ... wn"
Specifies that the table has n columns, with the specified widths, where each w is a dimension specification.
-relwidths "w1 w2 ... wn"
Specifies that the table has n columns. The width specifiers w1 through wn are integers. Each w is interpreted as a relative width and the total width of the table is proportionally divided among the columns.

The other options are:

-width dimension
Specifies the total width of the table. Ignored if -abswidths is specified. Default: the width of the page. (Actually, the default is only a guess at the page width, since RATFINK does not currently keep track of this...)
-frame rulestyle
-rowsep rulestyle
-colsep rulestyle
Specifies the default outer borders, default horizontal rules between rows, and default vertical rules between columns. rulestyle is the name of a rule style previously defined with rtf:ruleStyle (Note: -frame only works for the top, left, and right table borders; the bottom border must be re-specified on the final row.)
-align (left|right|center)
Specifies the alignment of the table as a whole relative to the page.
rtf:startRow [ -colspans "s1 s2 ... sm" ]
    [ -toprule rulestyle ]
    [ -botrule rulestyle ]
    [ -colsep rulestyle ]
    [ -rowheight dimension ]
    
# ...
rtf:endRow

rtf:startRow begins a new table row.

-colspans "s1 s2 ... sm"
specifies that this row contains only m cells; cell i spans si columns of the table. The sum of s1 through sm must equal the total number of table columns.
-toprule rulestyle
Specifies the rule above this row. Only legal for the first row in the table. Default: the table -frame option if this is the first row, the bottom rule of the preceding row otherwise.
-botrule rulestyle
Specifies the rule below this row. Default: the -rowsep rule style for the table.
-colsep rulestyle
Specifies the style of all vertical rules between cells for this row. Default: the table -colsep value.
-height dimension
Specifies the minimum height of this row. Default: The height of the enclosed text.

rtf:endRow and rtf:endCell are optional.

rtf:startCell [ paraStyle ]
...
rtf:endCell

Begins a new cell. Cells can contain inline text, or a series of paragraphs.

rtf:endCell is optional. It marks the end of the current cell.

3.10 Fields

A field in RTF is a hook for specifying program-specific commands to the word processor reading the RTF file. Fields contain two parts: a field instruction, which is the actual command; and an optional field result, which holds the results of processing the field. (The field result may be used to provide default text in case the application does not understand how to process the field instruction.)

There are two ways to insert fields with RATFINK:

rtf:insertField {instruction} [ "result text..." ]

Inserts a field instruction. instruction is any character data; backslashes and other special characters will be escaped before writing to the output. The second parameter is optional; if supplied it will be used as the text of the field result.

rtf:startField "field instruction"
# ... generate field result ...
rtf:endField

Like rtf:insertField, except the field result may contain arbitrary RTF text instead of a simple character string.

NOTE -- Many field instructions have optional parameters that are specified with sequences beginning with a backslash. Note that these are not RTF control words (except for \fldalt, but that's too horrifying to get into...).

The available field instructions vary from application to application; check the documentation for the program in question.

4 SGML Conversion



The commands described in the previous section and defined in rtflib.tcl are all general-purpose Tcl utilities for creating RTF, and may be used independently of Cost or SGML. The file RTF.spec is a high-level Cost script to assist in converting SGML to RTF.

rtf:convert specname

To specify the processing for a particular DTD, define a Cost specification supplying RATFINK processing parameters for each element type, then call the rtf:convert command with the name of your specification. (This is in addition to defining a stylesheet and document formatting properties as described above.)

NOTE -- A Cost specification maps document nodes to parameters based on queries; see the Cost reference manual for full details.

For example:

specification rtfSpec {
  {element P} {
    rtf para  
    paraStyle body 
  }
  {elements "DFN EM"} {
    rtf phrase
    charStyle hp0 
  }
  {elements "UL OL"} {
    rtf #IMPLIED
  }
  {element LI} {
    rtf para
    paraStyle litem
  }
  {element LI in UL} {
    prefix {$rtfSpecial(Bullet)$rtfSpecial(Tab)}
  } 
  {element LI in OL} {
    prefix {[childNumber].$rtfSpecial(Tab)}
  }
  {element PRE} {
    rtf linespecific
    paraStyle verbatim
  }
  {element H1} { rtf para  paraStyle heading1 }
  {element H2} { rtf para  paraStyle heading2 }
  ... etc.
}

rtf:convert rtfSpec

TIP -- rtf:convert is reentrant and may be called recursively, possibly with a different specification, for complex processing.

There is one mandatory parameter for every element: rtf. This specifies one of the following ``architectural forms'':

para
A paragraph or other displayed block. The required parameter paraStyle specifies the id of the paragraph style to use for this element.
phrase
Inline text with special character formatting. The required parameter charStyle specifies the id of the character style to use for this element.
section
A section. The sectStyle parameter, ditto.
linespecific
A displayed block in which record-ends (newlines) are significant.
special
Special-purpose processing. Use the parameters startAction and endAction to specify how to process this element.
#IMPLIED
No special processing for this element.

The optional parameters startAction and endAction are valid for every element. They specify Tcl code to execute at the start and end of the element, respectively. The code is evaluated at global scope.

Tcl variable- and command- replacement is performed on the charStyle, paraStyle, and sectStyle parameters.

NOTE -- The RTF translation routine prints a warning if there is no rtf parameter specified for an element.

4.1 Generated text

The following parameters may be specified for any element node, and may be used to specify automatically-generated text:

before
Data to insert before processing the element
prefix
Data to insert at the beginning of the element, after the start-of-element processing.
suffix
Data to insert at the end of the element, before the end-of-element processing.
after
Data to insert after processing the element.

Tcl variable- and command- replacement is performed on these parameters with the subst command. The result is inserted directly into the output file and may contain RTF control words. You should use the rtf:Escape command if the value might contain data that looks like an RTF control instruction; for example,

  {element IMG} {
    rtf #IMPLIED
    prefix {[rtf:Escape [q attval ALT]]}
  }

4.2 Nested paragraphs

Paragraphs do not nest in RTF, but they do in many SGML applications. For example, it is often legal to include

and other displayed material in the middle of a paragraph.

For para-form elements, the optional continuedStyle parameter names a paragraph style for subsequent blocks of text that are part of a logical paragraph in the SGML document but are treated as separate paragraphs in RTF.

4.3 Line-specific text

Normally, record-ends are converted to spaces. If rtf linespecific is specified for an element, then record-ends are processed as hard line breaks. The element is formatted as a single paragraph, and the paraStyle parameter also applies.

4.4 Sections

If rtf section is specified for an element, RATFINK starts a new section with rtf:startSection at the beginning of the element. (It does not call rtf:endSection at the end of the element, since in general sections may nest in an SGML document while they may not in RTF; keep this in mind.)

The startAction and endAction parameters are also evaluated for section-form elements. These parameters contain arbitrary Tcl code, evaluated at top-level (global) scope. (startAction may be used to generate headers and footers, for example.)

4.5 Data entities

The processing of data entities and data entity references is controlled by the content parameter. This is evaluated as Tcl code at global scope.

{dataent withdcn EPS} {
  content {
    rtf:insertField "INCLUDEPICTURE \"[query sysid]\"" {Picture goes here...}
  }
}

4.6 RATFINK architectural forms

The following meta-DTD describes the basic structure of the RATFINK output process:

<!-- Meta-DTD for RATFINK RTF conversion -->
<!doctype ratfink [
<!element ratfink	- - (section+|para+)>
<!element section 	- O (headings?, para+)>
<!attlist section
    sectStyle	CDATA	#REQUIRED	
>
<!element para		- O (#PCDATA|phrase)*>
<!attlist para
    paraStyle	CDATA	#REQUIRED	
>
<!element phrase	- - (#PCDATA)>
<!attlist phrase
    charStyle	CDATA	#REQUIRED
>
<!entity % headings "(fphead|fpfoot|head|foot|lhead|lfoot|rhead|rfoot)">
<!element headings	O O ( fphead? & fpfoot? & 
				((head? & foot?) | 
				 (lhead? & lfoot? & rhead? & rfoot?)) ) 
>
<!element %headings;  - -  (para+ | (#PCDATA|phrase)*)>
]>

Note that this DTD is not actually used by RATFINK; it is for descriptive purposes only.

Conceptually, the mapping from source document elements onto architectural forms is determined by the rtf parameter, which specifies the result element type; other parameters correspond to result attributes. The %headings; architectural forms do not corrsepond to source elements; they are instead generated by the application.

5 Bugs, limitations, and oddities

There is currently no way to set formatting properties for a section, paragraph, or phrase without defining a stylesheet entry.

Handling nested lists and other such things is more difficult than it ought to be.

RATFINK does not always output control words in the order prescribed by the RTF syntax productions (but neither does Microsoft Word, for what it's worth...)

There is no support for pictures, drawing objects, embedded objects, or other features.

NOTE -- I'd really like to support of bitmapped images, but the RTF spec is extremely unhelpful on this point.

Does not handle context-sensitive style information very well. For example, if the DTD allows bulleted lists inside regular paragraphs and inside notes, and the desired formatting is to set regular paragraphs in a roman font and notes in a sans-serif font, then there must be distinct RTF paragraph styles for lists inside notes and lists inside regular paragraphs.

If a style overrides a parameter in its base style, the corresponding control word will be emitted more than once. Since the later setting takes precedence, this usually makes no difference, but it means that ``flag'' control words cannot be turned off if they are turned on in a base style, and that tab stops in a base paragraph style may not be cleared.

I've done my best to make sure that this library only generates legal RTF (as far as my understanding of the specification goes), but it is still possible in certain obscure circumstances for the output to crash Word.

RTF has control words to embed table of contents entries, and index entries; however, there are no control words to build a table of contents or index. Consequently, I haven't bothered to support these features in RATFINK.

NOTE -- With Word for Windows 95 Version 7 you can do these things with field instructions, so they may be useful after all.

RTF supports automatic numbering of lists and headings, but not very well. For example, if you include a paragraph inside a list item Word resets the counter for the next item; and if you have two numbered lists in a row with no intervening paragraphs there is no way to restart the list numbers at 1. Consequently, I haven't bothered to support these features either.

Many SGML applications assume an application convention whereby multiple spaces are equivalent to a single space. Many text formatting utilities (TeX, n/troff, Scribe, etc.) work this way, but RTF does not: all spaces are significant. RATFINK does not do anything to compress multiple spaces by default; however you can do some tricks with short reference maps to take care of this in the parser.

The output of RATFINK has been extensively tested with Microsoft Word for Windows 95 Version 7, and to some extent with Microsoft Word for Macintosh Version 5.1a. I have no idea how well it will work, if at all, with other word processors; chances are good that there will be differences in other applications' interpretations of the specification.

It takes a lot of work to get any sort of decent typography out of Word.

6 References and related material

Information about Cost can be found at http://www.flightlab.com/cost/.

Another tool for converting SGML to RTF is JADE, James Clark's amazing DSSSL engine. See http://www.jclark.com/ for details. This program works under Win32 and most Unix variants.

The RTF format is defined in the Application Note GC0165, available on the Microsoft FTP server at ftp://ftp.microsoft.com/Softlib/mslfiles/gc0165.exe.

NOTE -- The 1.4 RTF spec also includes a sample RTF reader program, but it isn't very good; Paul Dubois' RTF tools are a better bet.

The Microsoft material is only supplied as a self-extracting DOS executable. If you don't have a DOS system available, you're not completely out of luck: the Info-ZIP project UNZIP utility runs on just about every system imaginable and is able to unpack this format. See ftp://ftp.uunet.net/pub/archiving/zip/ and elsewhere (ask Archie; it's widely mirrored). You'll still need a copy of Microsoft Word to read the RTF specification, though.

Information about WINHELP may be found in the Windows Help Authoring Toolkit at ftp://ftp.microsoft.com/Softlib/mslfiles/what6.exe, and the Usenet newsgroup comp.os.ms-windows.programmer.winhelp, and its related FAQ.

WARNING -- Winhelp is not for the faint of heart or weak of stomach. RTF is pretty messed up, but Winhelp is a complete abomination.

To parse RTF files, check out Paul Dubois' excellent RTF tools at http://www.primate.wisc.edu/software/RTF/. See also rtftohtml at http://www.sunpack.com/RTF/.

OSF's ``Rainmaker'' software can convert RTF documents to the Rainbow SGML document type; see ftp://ftp.ebt.com/pub/nv/dtd/rainbow/

7 Acknowledgments

Many thanks to Boris Tobotras for testing, bugfixes, and enhancements.