perm filename NEWPUB.OLD[2,TES] blob sn#035513 filedate 1973-04-08 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00021 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	A PROPOSAL FOR THE NEW DOCUMENT SYSTEM
C00005 00003	BLOCK DIAGRAM
C00007 00004	THE MANUSCRIPT
C00009 00005	TEXT EXPRESSIONS
C00011 00006	GLYPHS
C00013 00007	THE DEVICE SPECIFICATION
C00015 00008	THE FORMATTER
C00017 00009	GALLEYS
C00019 00010	THE PAGINATOR
C00021 00011	THE POLISHER
C00022 00012	THE PRINTER/VIEWER
C00024 00013	THE REGISTRY
C00026 00014	GLYPH FILES
C00028 00015	THE MLISP EXTENSION
C00030 00016	ADVANTAGES AND DISADVANTAGES OF MLISP
C00033 00017	STANDARDS
C00034 00018	REALIZATION
C00037 00019	APPENDICES
C00038 00020	FIGURE 1
C00042 00021	FIGURE 2
C00043 ENDMK
C⊗;
A PROPOSAL FOR THE NEW DOCUMENT SYSTEM

Larry Tesler, Brian Harvey, Lester Earnest,
Tovar Mock, and Robert Sproull


The new system has two main purposes:

(1) To provide a means for flexible production of medium-quality
documents such as technical reports, manuals, theses, and books
which may include text, line drawings, half-tone images, and
mathematical symbolism.

(2) To provide a standard representation for such documents that
can be printed or displayed on various kinds of output devices
by various kinds of computers with reasonable results.

The proposed participants in development of the new system are
Stanford University, Carnegie-Mellon University, and Xerox Palo
Alto Research Center.

This proposal was prepared by a committee of Stanford and Xerox
people.  A committee at CMU is concurrently preparing
its own proposal.  The two proposals shall be exchanged as well
as submitted to other interested parties for comment, criticism,
and reconciliation.
BLOCK DIAGRAM

A block diagram of the proposed system is shown in
Figure 1.  Dash-boxes represent computer files; plus-boxes
represent visible copy; starred boxes represent programs.

The system starts with a "scribble" in an author's head or
on paper.  Using a conventional TEXT EDITOR, the author
prepares a "manuscript" file encoded in a PUB-like language.
The manuscript is fed to the FORMATTER program which produces
a "galley proof".  The galley may be printed (or displayed)
by a PRINTER/VIEWER program to be proofread by the author for
errors.  To correct errors, changes are made to the manuscript
and the FORMATTER is run again.

Once an acceptable galley proof is obtained, it is fed to the
PAGINATOR and POLISHER programs which produce a "document"
file.  This file may be printed (or displayed) by the PRINTER/
VIEWER program.  Again, if errors are discovered, corrections
must be made in the manuscript and the cycle repeated.

Auxiliary programs and files that appear in the block diagram
will be explained in subsequent sections.
THE MANUSCRIPT

The manuscript contains sufficient information for the
system to compute the document without human intervention.
Thus, the system is basically non-interactive.  However,
this does not preclude provision for optional interaction
at appropriate points for debugging and advising purposes.

The manuscript is actually a computer program in the yet
unnamed language P.  P is similar to PUB except that PUB
is an augmented subset of SAIL while P is an extension
of MLISP.  The complete facilities of MLISP are available
to the author, including variables, arrays, for-statements,
recursion, list structures, function declarations, and
interaction.

Among the extensions to MLISP in P are "text expressions",
"math expressions", "calligraphic expressions", "image
expressions", "portion declarations", "area declarations",
and "group declarations".
TEXT EXPRESSIONS

Text expressions are equivalent to "paragraphs" in PUB.
Every text expression has a "class", which may be specified
in the manuscript explicitly by name or implicitly by
form (cf. "AT n" in PUB).  Associated with each class are
formatting procedures.  Examples of classes might be "prose",
"quotation", "table", "heading", and "Algolprogram".

A text expression is composed of "words" and each word is
composed of "virtual glyphs" (formerly called "characters").
An example of a virtual glyph (or "virgle") is "Small Seriph
Italic Upright Black Alpha".  A "Glyph Map" fed to the system
along with the manuscript maps virgles into "actual glyphs"
or "augles".  For example, the glyph map may say that
"Small" is "8 point", "Seriph" is "Elzevir", and "Alpha" is
"Greek 101".  Or it may map all sizes into one, all fonts
into LPTFONT, and all glyph-sets into ASCII characters.

The glyph map is conceptually an n-dimensional sparse array
of functions.  For example, "Large Seriph Italic A" may be
specified as appearing explicitly in a certain glyph file
or may be specified as a scale-reduction applied to an oversize
glyph.
GLYPHS

Among the n coordinates that define a glyph are:

(1) Code.  An integer between 40 and 172 octal selecting a
particular character out of a character set.

(2) Set.  A set of up to 91 characters, e.g., Greek Alphabet,
Math symbols 1, Accents.

(3) Case.  Upper, Lower.  Differs only for letters in alphabets.

(4) Style.  Light, Bold, Italic, Bold Italic, Demibold, etc.

(5) Font.  Caslon, Elzevir, Times Roman, Lptfont, Datadiscfont,
JohnDoefont.

(6) Size.  Measured in Points.  The P language has point-pica-inch
conversion primitives.

(7) Orientation.  Upright or some other angle between 0 and 360
degrees.

(8) Thickness.

(9) Texture.

(10) Color.
THE DEVICE SPECIFICATION

A "Device Specification" file must be fed to the system along
with the manuscript and the Glyph Map.  Conceptually, the
Device Specification defines a printing or viewing device
as a set of attributes such as RASTERSCAN, 200PPI, 2FONTS,
NOGRAYSCALE.  Actually, the file is a collection of MLISP
DEFPROPs and procedures through which the FORMATTER,
PAGINATOR, and POLISHER programs filter the manuscript to
obtain a document that can be processed by the PRINTER/VIEWER
program for the specified device.

Keeping such procedures on a separate file (usually in LAP
form for efficiency) keeps the kernel system small even when
new devices are added to its capability.

The PRINTER/VIEWER program and the Device Specification File
are provided by each installation for each of its devices.
It may be possible in some cases for an installation to
use a single P/V and Device Spec for several devices.  In such
a case, a single document file could be printable on all of them.
THE FORMATTER

The FORMATTER program is similar to the PARSER and FILLER
modules of PUB.  The PARSER is replaced by the MLISP compiler
and the LISP system.  The FILLER is replaced by modules for
text, math, line-drawings, and images.  The pagination
capabilities of PUB are intentionally omitted to simplify
the FORMATTER and to allow more complex capabilities to be
handled by the PAGINATOR program.

During operation of the FORMATTER, the author can monitor
its progress on a terminal, interrupt it at landmark points,
and interact with it at breakpoints and error points.

The FORMATTER may generate tables of contents, indices, etc.
in manuscript format as in PUB.  If it does, it swaps in an
ALPHABETIZER program to sort the indices. Then the FORMATTER
is swapped back in to process the generated portions.

A hyphenation capability is included in the text module
for those who like it.

The manuscript is structured into one or more portions,
each of which may be divided into sections.  Non-global
declarations are local to portions and to sections (unlike PUB).
Thus, it is possible to format sections independently, but care
must be taken if there are interactions (e.g., figure numbering
that does not start over at 1 in each section).
GALLEYS

The FORMATTER outputs two files called the "galley" and the "galley
guide" (analogous to the PUInS.PUI and the PUIn.PUI files of PUB).

The galley contains text, drawing directives, and image directives,
with sufficient information so that the Printer/Viewer program
can display it provisionally justified but not paginated.  There
is a single column for each section.  Footnotes and diagrams appear
close after the text which references them. Cross-references are not
resolved.

The galley guide is an abstract of the galley in which content is
omitted, size information is elaborated, and pagination directives
are carried forward.  The galley guide contains sufficient information
for the PAGINATOR program to lay out the document into pages, areas,
boxes, and columns.
THE PAGINATOR

The PAGINATOR Program does not input the galley but only the
galley guide.  It essentially juggles rectangles and possibly
other shapes to fit them into pages, areas, and columns,
keeping groups together, placing footnotes below their
rererents, and keeping figures near the texts that describe
them.

The PAGINATOR needs to know device specifications but nothing
about glyphs.  It also needs to know the author's pagination
directives from the manuscript.  These can all be found in the
galley guide.

The principal output of the PAGINATOR is the "Paginated Galley
Guide".  This is probably in the same format as the Galley Guide,
but its content is sorted, structured, and pruned.

Whenever the PAGINATOR completes a page, it writes all cross-
reference labels that appeared on that page onto a file called
the "Cross-Reference Table" (CRT? -- no, XRT!).
THE POLISHER

Some Printer/Viewer programs may have the sophistication to be
able to input the galley, the paginated guide, and the XRT and
display a finished document (see dotted line in Figure 1).
However, the normal procedure is to feed them to the POLISHER
program which produces a well-ordered "document" file in which
pages are together and cross-references are resolved.  This
file is easily handled by the P/V.
THE PRINTER/VIEWER

This device-dependent program can print either the galley or
the polished document, becuase both files are in the same
format. 

For raster devices, the P/V may have two passes.  One
generates bit matrices from vector/text representations, while
the other actually prints the matrices.

The P/V program may be parametric at the option of the installation.
In certain cases, it may be possible to substitute certain fonts for
others, to change the resolution specification, or to select certain
pages for output.

The P/V is the only program that looks at the actual images
of glyphs.  These glyphs are in a form appropriate to the
device, e.g., octal code, bit matrix, vector outline.  The
actual image is normally computed from a contour representation
extracted from the Registry.
THE REGISTRY

There is a Network Registry of Glyphs as well as local
registries.  A document referring to a local registry
can not be transmitted over the Network.  Use of local
registries should be limited to storing new glyphs that
have not had an opportunity to be registered in the
Network Registry.

The Registry consists of a Glossary and a Directory.

The Glossary lists the available Sets, Cases,
Styles, Fonts, and so forth.  There is a procedure
for adding new entries to the Glossary, e.g., the
Russian alphabet to the Set Glossary or Clarendon to
the Font Glossary.  It is also possible to add new
characters to existing incomplete sets.

The Directory lists every Glyph Files registered by
a participating installation, including its coordinates
in the sparse array, complete file name, and site name.
The coordinates must be use the terminology of the
Glossary.

It is not permissible to change a glyph file once it has
been registered in the Directory.
GLYPH FILES

Each Network Glyph File defines up to 91 glyphs.  The file header
contains geometric information needed by the FORMATTER and
POLISHER programs, such as height, width, Kernian information,
and transformation clues for changing scale, orientation,
and thickness.  The remainder of the file contains a curved
contour representation of each glyph.

Each local installation is expected to have its own GLYPH
CONVERTER to generate local glyph files (see Figure 2).
The headers are simply copied from Network Glyph Files,
possibly changing scale, orientation, and thickness.  The
contours are converted to bit matrices or vector outlines
as appropriate.

In the case of trivial devices such as line printers, trivial
glyph files should be produced by the installation.  However,
it is important to stay within the framework of the registry.
For example, if the LPT has an integral sign, it should be
specified in the glyph map as, say, "math-set 63" rather than
as "latin-set 14".  The local math-set glyph file would then
specify that glyph 63 is really octal 14 on the LPT.  Other
glyphs in the local math-set file would have no good
representation on the LPT.
THE MLISP EXTENSION

Several simple changes to MLISP will be made:

(1) Contraction.  Some features that would be useless to the
system and to most authors will be removed in the interest of
saving space.  Authors needing these features could LAP them in.

(2) Macros.  The MLISP "DEFINE" only replaces one token by another.
Macros in P must be able to replace either an identifier or a
sequence of delimiters by an arbitrary sequence of tokens.
Invisible tokens such as spaces, tabs, and line boundaries must
be recognized as tokens in text expressions of P.

(3) Strings.  The LISP string facilities are different in every
system and inadequate in all.  P will have its own string package
with a few primitives to be encoded in LAP for each object machine.
A string will be a series of glyphs; thus, the package would compute
widths and heights of text units such as words at high speed.
ADVANTAGES AND DISADVANTAGES OF MLISP

Among the advantages of an MLISP implementation of the new
system are:

(1) Efficiency.  The language will be processed by an extension
of the existing MLISP compiler, which translates at 3000
lines per minute, more than three times faster than PUB
Pass One.  Most PUB macros could be procedures (EXPRs and
FEXPRs) in P, so their execution will be several times faster
than in PUB (PUB spends much of its time expanding macros).

(2) Flexibility.  Author procedures could directly call
or redefine procedures in the system.  During debugging,
the author could set breakpoints and perform traces.

(3) Portability.  The extended MLISP compiler will be written
mostly in STANDARD LISP, so that it will be transportable
to new installations with a minimum of effort.

The system should run equally well (except for speed differences) in
LISP1.6, TENEX-LISP, ILSP, MACLISP, and LISP70.  With a small amount
of LAP programming, it should run in LISPs on other computers than
the PDP-10 as well.

Disadvantages of MLISP are:

(1) Size.  The LISP1.6 version of the FORMATTER will probably be
nearly as large as PUB Pass One, becase of LISP and MLISP overhead.
This will be remedied when LISP70 is operational.

(2) Inefficiency.  The PAGINATOR and POLISHER may be simple enough to
be programmed in machine language at a substantial gain in efficiency.
This may be done after portable LISP versions are operational.
STANDARDS

The following file formats shall be standardized:

(1) Individual Documents

	a. Manuscript.
	b. Galley and Document (same format).
	c. Cross-Reference Table.
	d. Galley Guide.
	e. Paginated Galley Guide (similar to d?).

(2) Registry

	a. Glossary
	b. Directory
	c. Glyph File Header
	d. Curved Contour Representation

The following programs shall be written in portable fashion:

(1) FORMATTER

(2) PAGINATOR

(3) POLISHER
REALIZATION

Manuscript and Registry standards shall be proposed by
Palo Alto and Galley and Document standards by Pittsburgh.

The FORMATTER shall be programmed by Rich Johnson and
Brian Harvey with assistance by Larry Tesler.

The PAGINATOR and POLISHER shall be programmed at CMU.

MLISP extensions shall be made at Stanford.

The ILSP implementation will be maintained by CMU, the
LISP1.6 (and later LISP70) implementations by Stanford,
and the TENEX-LISP implementation by Xerox.

Each installation shall provide its own glyph converters,
text editors, device specifications, and printer/viewers.
However, the possibility of collaborating on XGP service
should be explored as the project proceeds.  CMU shall
be the motivating force and shall do most of the programming.

A target date of August 15 is suggested for a first version
of the system.  Although only a subset will be implemented
in the first version, the framework for supplying the
remainder must be provided.

This optimistic estimate is based on the fact that PUB
was completed in six months by one person in an
inappropriate language.  The new implementation is simplified
by separating pagination from filling and by building on
an existing compiler.  Although the new system has many
sophisticated facilities, they have all been done before in
some form by some of the implementors.
APPENDICES

Included for completeness are memos by Dan Swinehart on
the registry, by Brian Harvey on math, by Bob Sproull
on graphics.  Unfortunately, Sproull's document is
not machine-readable, so only an abstract appears here.
FIGURE 1
               +++++++++++
              |  SCRIBBLE |
               +++++++++++
                    |
                    ∨
               ***********
              |TEXT EDITOR|
               ***********
                    |
                    ∨
               -----------
              |           |
              | MANUSCRIPT|
              |           |
               -----------
                    |
                    |<-------------------------
 +++++++++          ∨                          |
|         |    ***********      ************   |
| MONITOR |<--| FORMATTER |--->|ALPHABETIZER|--
|         |    ***********      ************
 +++++++++     |        |
               ∨        ∨
    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |-----
   |            |  |          |     |
    ------------    ----------      |
             |                      |
             ∨                      |
            ***********             |
           | PAGINATOR |            |
            ***********             |
             |       |              |
             ∨       ∨              |
     -----------   -----------      |
    | PAGINATED | |   CROSS   |     |
    |  GALLEY   | | REFERENCE |     |
    |   GUIDE   | |   TABLE   |     |
     -----------   -----------      |
             |           |        -----
             |           |       |     |
             ∨           ∨       ∨     |
             ---------------------     |
                 |             .       |
                 ∨             .       |
            ***********        .       |
           |  POLISHER |       .       |
            ***********        .       |
                 |             .       |
                 ∨             .       |
            -----------        .       |                  +++++++++
           |           |       ∨       ∨   *********     |HARD COPY|
           |  DOCUMENT |------------------| PRINTER |--->|   OR    |
           |           |                  | /VIEWER |    | DISPLAY |
            -----------                    *********      +++++++++
FIGURE 2

         ----------
        |          |
        | REGISTRY |
        |          |
         ----------
             |
             ∨
         ***********
        | CONVERTER |
         ***********
          |       |
          ∨       ∨
 ------------    ---------
|  GLYPH     |  |  GLYPH  |
|DESCRIPTIONS|  | IMAGES  |
 ------------    ---------