Encoded files, formatted files, portability, etc...

Jim Croft jrc at ANBG.GOV.AU
Sat Feb 13 11:07:10 CST 1993


The recent exchange on taxacom on posting of uuencoded files touched on
on a topic that has been bugging me for some time: the tendency of people
to invoke the arcane rites of wordprocessors where a simple text editor
would be quite sufficient.

Most electronic writing in our organization, which I doubt is little
different from any other government, academic or private bureaucracy, is
concerned with minutes and instructions that will be ignored by
management and staff alike, or with manuscripts that will be butchered
by editors, reviewers and other low-life.  Such documents are not really
enhanced by a 16.5 point grotesque italic font in a double-bordered
shaded shadowed box.  But people insist on doing this kind of thing,
simply because it can be done.

For those of you who follow such things, there has been some fairly
intense discussion between librarians and information professionals on
net groups such as pacs-l, about the relevant importance of 'form' and
'content'.  This is an on-going difference of view and there does not
appear to have been a resolution.

The problem we are facing is how to make a piece of scientific
information available to the widest possible audience in a form that
they can do something useful with.  At the moment our strategy is to
place anything botanically useful or interesting on a gopher information
server (mor on this later), in a form that *any* system can access and
use.  Because of the almost total lack of compatibility between systems
we are forced to use lowest common level of human understandable data
representation: 7-bit ascii.  That means no bold, no italics, no
underlining, no fancy paragraphs or page layout, not fonts, no graphics,
no diacritics, etc.; just simple, raw, unembellished, naked information.

The first attempt at achieving this was to produce ascii exports of the
word-processed files.  This sort of worked but was not without problems.
People specified paragraphs in different ways, tabs were different
widths, page breaks were handled differently etc.  All this meant that
the ascii export files had to be re-edited to a greater or lesser extent
to make them generally presentable on screen and paper.

To try and reduce the amount of re-editing and introduce a degree of
uniformity between documents prepared on different word-processors, a
rough style manual was thrown together. Appended below.

We would really appreciate comments on the concept and the content of
such a manual to make it more flexible and usable.

cheers

jim

------------------------ Cut here --------------------------
DRAFT

     Australia National Botanic Gardens

     Electronic Document Style Manual
     ================================

     Prepared by Jim Croft, January 1993


     Background
     ----------
     Most documents of substance these days are prepared electronically
     and often there is a desire or need to place these documents on the
     network to achieve wide and timely distribution.

     Problems of incompatibility arise because documents are prepared on
     a variety of word-processors and text editors and are received for
     storage and display by an large number of combinations of terminals
     and display applications ( such as word-processors ).

     Word-processors allow users to set attributes of characters,
     paragraphs, pages, and documents but although there are some
     conventions, they are not standard and different word-processing
     applications implemented the same thing in different ways.

     Moreover, text editors and word-processors allow users to create
     documents of similar appearance using different techniques.  This
     creates problems when documents are moved and converted from one
     environment to another.

     The layout of a document depends very much on the purpose for which
     it is intended, the objective being to minimize the amount of
     reprocessing required to load, read or incorporate the text.

     Helpful tips on layout can be found in manuals covering the
     preparation of typewritten reports.  A chapter on this occurs in
     the earlier editions of the Australian Government Printing
     Service 'Style manual for authors, editors and printers of
     Australian government publications'.  Interestingly, in the latest
     edition this chapter has been omitted in favour of more detailed
     accounts of electronic desktop publishing.

     The following suggestions cover some topics that may make a
     document more readable as as ASCII text file, and less likely to
     require intensive manual editing when imported into another
     wordprocessing application.

     Pagination
     ----------
     Pagination is not a critical matter in electronic text where linear
     scrolling through the document is possible.  If you chose to
     paginate a document, bear in mind there is no international paper
     size and documents formatted for the old world A4 paper size may not
     print comfortably on the new world 8.5 x 11 inch paper size.

     For this reason, if page breaks are essential, it is preferable to
     use the ASCII page break code, ^L, rather than the appropriate
     number of linefeeds required to place the text on the next page.

     Running headers and footers, may make a printed document look
     'nicer', but they are irritating when reading an electronic
     document a screenful at a time.

     Page Size
     ---------
     As some terminals and emulations are not resizable and have a
     screen width limited to 80 columns, and some printers truncate (or
     wrap awkwardly) lines longer than this, the effective workable
     line width should be kept within this limit.

     The page length perhaps should not be specified, following the
     comments above.  If a document is formatted for a particular paper
     size if will look awkward on the screen, if formatted for a 24 or
     25 line screen it will look awkward when printed.

     Layout, Columns
     ---------------
     To make a document more readable on the screen, it should be
     indented on the left margin (say 2 - 5 spaces).  To incorporate
     it into a word-processor the indent spaces on each line will need
     to be globally removed.  Each line of text should fall short of
     the margin by a comparable amount.

     Multiple columns of text may look good in a printed newsletter by
     are next to impossible to read on an 80 x 25 screen.  The document
     should be a single text stream of a single column.

     Hyphenation
     -----------
     Hyphen to break words should be avoided as there is no standard to
     handle optional hyphenation in ASCII friles.  When imported into a
     text editor or word-processor, the hyphens will almost certainly be
     in the wrong place and have to removed.

     Fonts, Style and Size
     ---------------------
     The font style and size of a letter have little meaning in an ASCII
     text file.  To get an idea of what a file will look like, compose
     it in 12 point courier without attributes such as bold, italic,
     underline, etc.  Courier is recommended because it is fixed with;
     proportional fonts such as helvetica and times roman will not line
     up in the same manner as fixed width fonts.

     Italics, Bold, Underline
     ------------------------
     There is no standard way of displaying or printing character
     attributes such as bold, italic and underline; what works on one
     printer or screen may not work on another.

     Some conventions have been adopted on the network and may be
     appropriate.  _Italic_ and _underlined_ text can be indicated by
     surrounding the text with the underscore character.  *Bold* or
     *emphasized* text can be indicated by surrounding the text with an
     asterisk.

     In titles, it is possible to underscore words by placing a row of
     hyphens or equal signs (double underline) on a blank line beneath
     the title, as in this document.  This is not compatible with the
     underlining conventions of word-processors.

     Wordspacing and Letter-spaced Words
     -----------------------------------
     In general, one blank space should be left between words (see
     Justification, below).  L e t t e r - s p a c e d   words are
     often effective in titles and sometimes for emphasis in the body of
     the text.  If two or more words are involved, they should be
     separated by three spaces.  With access to different font sizes,
     letter spacing is rarely used in word-processors.

     Diacritic Marks and Symbols
     ---------------------------
     There are no universally accepted ways of display or printing
     letters with diacritic marks and so these should be avoided.

     On some printers it possible to fake some diacritics with the
     use of the backspace function, but this does not work on
     terminals.

     Ligatures should be represented as two separate letters, ae, oe,
     etc.

     Tabs
     ----
     The tab character is very useful in aligning columns of text.
     Problems arise because tabs may have different widths depending on
     the environment or word-processor or text editor.  Common values
     are 8 characters, 5 characters or 0.5 inches, 10 characters or 1
     inch.  Problems arise when tabs laid out on one enviromnet are
     displayed on another - the columns often to not line up.
     Furthermore, the ASCII tab character (^I) left aligns text; there
     is no standard way to right align or decimal align ASCII text.

     It is recommended that tabs be avoided and text be laid out using
     spaces.  This of course works only with fixed width fonts and not
     when the document is formatted with a proportional fonts.

     Paragraphs
     ----------
     The left and right paragraph margins should be set following the
     document margin conventions listed above.

     Paragraphs can be separated in a number of ways: a blank line
     bewteen each; indenting (say 5 spaces) the first line of each
     parargraph; the presence or a hard carriage return or linefeed; and
     so on.

     Paragraphs separated by blank lines are easier to pick out and
     conveniently separate large blocks of text.  Lines within a
     paragraph should end in a newline character and each paragraph
     separated by a blank line (two successive newline characters).

     It is not necessary to use both a blank line and an initial indent
     to delimit prargraphs.

     Line Spacing
     ------------
     Generally speaking text should be single spaced, with a double
     space between paragraphs.  For clarity and emphasis in some lists
     it may be appropriate to double space the items.  Incremental
     spacing is not defined in ASCII text and is not available on many
     terminals and printers.

     Justification, Centering
     ------------------------
     Block text should be left-justified.  While it is possible to
     pad the space bewteen words to give the appearance of fully
     justified text, this is not done evenly and looks contrived;
     moreover, the inserted hardspaces upset the spacing in the text is
     imported into a word-processor.

     Centered text should be done by left padding with an appropriate
     number of spaces.

     Titles, Headings
     ----------------
     Titles and headings should break up and draw attention to the
     relative importance of various parts of the text.  They should be
     employed liberally to give structure to a linear stream of text.
     A hierachy of headings can be employed using a combination of
     centering, left and right justification, upper and lower case,
     letter spacing, underlining with hyphens or equal signs, etc.

     Further structure or emphasis can be achieved by indenting the
     paraphs (say, an additional 2 - 5 spaces) at each level of the
     hierachy.

     Tables
     ------
     Table are awkward things and do not translate comfortable between
     different word-processors and display devices.  It is recommended
     that they be created using fixed width fonts and appropriate blank
     padding.  Boxes can be drawn around tables.

     Lines and Boxes
     ---------------
     Horizontal lines can be drawn with the characters _, -, = and
     vertical lines with |, !, I.  the plus sign, +, makes a nice
     intersection between the vertical and horizontal, | and -.

     Graphics
     --------
     There is no way to adequately handle graphics using the standard
     7-bit ASCII character set.  An approximation of graphics can
     sometimes be achieved by using characters such as:

          !@#%^()*_-+=[]|\/<>v0OoI,

     with appropriate blank padding to achieve alignment.  This will
     only work with fixed width fonts.



______________________________________________________________________________
Jim Croft                  [Herbarium CBG]           internet: jrc at anbg.gov.au
Australian National Botanic Gardens                     voice:  +61-6-2509 490
GPO Box 1777, Canberra, ACT 2601, AUSTRALIA               fax:  +61-6-2509 599
____Biodiversity Directorate, Australian National Parks & Wildlife Service____




More information about the Taxacom mailing list