perm filename XPUB.PUB[2,TES] blob sn#036547 filedate 1973-04-18 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00023 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.COMMENT THE NEW DOCUMENT SYSTEM 
C00004 00003	.GROUP SKIP 5
C00006 00004	.SEC BLOCK DIAGRAM
C00010 00005	.SEC THE MANUSCRIPT
C00012 00006	.SS TEXT EXPRESSIONS
C00014 00007	.SS GLYPHS
C00016 00008	.SS THE DEVICE SPECIFICATION
C00018 00009	.SS THE FORMATTER
C00021 00010	.SEC GALLEYS
C00023 00011	.SS THE PAGINATOR
C00025 00012	.SS THE POLISHER AND THE DOCUMENT
C00026 00013	.SS THE PRINTER/VIEWER
C00028 00014	.SEC THE REGISTRY
C00030 00015	.SS GLYPH FILES
C00032 00016	.SEC THE MLISP EXTENSION
C00034 00017	.SS ADVANTAGES AND DISADVANTAGES OF MLISP
C00037 00018	.SEC STANDARDS
C00038 00019	.SEC REALIZATION
C00041 00020	.SEC PROPOSAL FOR GRAPHICS LANGUAGE
C00077 00021	.SEC FIGURE 1
C00081 00022	.SEC FIGURE 2
C00082 00023	.FILL
C00083 ENDMK
C⊗;
.COMMENT THE NEW DOCUMENT SYSTEM ;
.TURN ON "{"
.NOJUST
.REQUIRE "PUBMAC.DFS[1,3]" SOURCE_FILE
.STANDARD FRONT("I", "!-A")
.EVEN HEADING({SECNAME},,{DATE})
.ODD HEADING({DATE},,{SECNAME})
.EVERY FOOTING(,{PAGE!})
.GROUP SKIP 5
.BEGIN CENTER
A PROPOSAL FOR THE NEW DOCUMENT SYSTEM

Larry Tesler, Brian Harvey, Lester Earnest,
Tovar Mock, and Robert Sproull
.END
.SKIP 4
The new system has two main purposes:

(1) To provide a means for flexible production of medium-to-high quality
documents such as technical reports, manuals, theses, and books
which may include text, line drawings, half-tone images, and
mathematical symbolism.

(2) To provide a standard representation for such documents that
can be printed or displayed on various kinds of output devices
by various kinds of computers with reasonable results.

The proposed participants in development of the new system are
Stanford University, Carnegie-Mellon University, and Xerox Palo
Alto Research Center.

This proposal was prepared by the Palo Alto Committee, consisting
of Stanford and Xerox
people.  It deals with the overall organization of the system
and with details pertaining to front-end processing.
The Pittsburgh committee at CMU has prepared a proposal dealing
with the tail-end of the system.  The two proposals shall be exchanged as well
as submitted to other interested parties for comment, criticism,
and reconciliation.
.SEC BLOCK DIAGRAM

A possible block diagram of the proposed system is shown in
Figure 1.  Dash-boxes represent computer files; plus-boxes
represent visible copy; starred boxes represent programs.

The system starts with a "scribble" in an author's head or
on paper.  Using a conventional TEXT EDITOR, the author
prepares a "manuscript" file encoded in a PUB-like language.
The manuscript is fed to the FORMATTER program which produces
a "galley proof".  The galley may be printed (or displayed)
by a PRINTER/VIEWER program to be proofread by the author for
errors.  To correct errors, changes are made to the manuscript
and the FORMATTER is run again.

Once an acceptable galley proof is obtained, it is fed to the
PAGINATOR and POLISHER programs which produce a "document"
file.  This file may be printed (or displayed) by the PRINTER/
VIEWER program.  Again, if errors are discovered, corrections
must be made in the manuscript and the cycle repeated.

Auxiliary programs and files that appear in the block diagram
will be explained in subsequent sections.

This system has an inherent flaw.  It is not possible to
create non-rectangular columns of text, because the FORMATTER
does not know where page boundaries will fall.  Non-rectangular
columns are useful for displaying insets.

To remedy this flaw, an alternate sytem organization is under
consideration, that of PUB.  Formatting and pagination are
performed in a single pass with backtracking.  For example,
at the beginning of a "group" (all to appear on one page),
a decision point (choice/failset point) is set with two
choices: continue on this page or go to next page.  If the page
runs out before the group is finished, a failure resets the state
to that at the beginning of the group and the alternate choice
is made.

In PUB, backtracking was slow and overly restricted because of the
limitations of SAIL.  A new system using this organization would
be written in a language that has built in backtracking capability.

The disadvantage of the combined FORMATTER/PAGINATOR is that
galleys do not come out so soon for proofreading.  The combined
version would probably run 1.5 to 3 times slower than the
FORMATTER alone would, depending on the amount of backtracking
required (estimates based on experience with programming
language compilers).

Most of his document will assume separate FORMATTER and PAGINATOR
programs.  However, the implications for a combined system
should be obvious.
.SEC THE MANUSCRIPT

The manuscript contains sufficient information for the
system to compute the document without human intervention.
Thus, the system is basically non-interactive.  However,
this does not preclude provision for optional interaction
at appropriate points for debugging and advising purposes.

The manuscript is actually a computer program in the yet
unnamed language P.  P is similar to PUB except that PUB
is an augmented subset of SAIL while P is an extension
of MLISP.  The complete facilities of MLISP are available
to the author, including variables, arrays, for-statements,
recursion, list structures, function declarations, and
interaction.

Among the extensions to MLISP in P are "text expressions",
"math expressions", "calligraphic expressions", "image
expressions", "portion declarations", "area declarations",
and "group declarations".
.SS TEXT EXPRESSIONS

Text expressions are equivalent to "paragraphs" in PUB.
A text expression has a syntax such as "a blank line
followed by an indented line followed by several
unindented lines".  The compiler translates it to another
format according to similar syntax specifications.
Examples of text expression formats might be "prose",
"quotation", "table", "heading", and "Algolprogram".

A text expression is composed of "words" and each word is
composed of "virtual glyphs" (formerly called "characters").
An example of a virtual glyph (or "virgle") is "Small Seriph
Italic Upright Black Alpha".  A "Glyph Map" fed to the system
along with the manuscript maps virgles into "actual glyphs"
or "augles".  For example, the glyph map may say that
"Small" is "8 point", "Seriph" is "Elzevir", and "Alpha" is
"Greek 101".  Or it may map all sizes into one, all fonts
into LPTFONT, and all glyph-sets into ASCII characters.

The glyph map is conceptually an n-dimensional sparse array
of functions.  For example, "Large Seriph Italic A" may be
specified as appearing explicitly in a certain glyph file
or may be specified as a scale-reduction applied to an oversize
glyph.
.SS GLYPHS

Among the n coordinates that define a glyph are:

(1) Code.  A small integer selecting a
particular character out of a character set.

(2) Set.  A set of up to 91 characters, e.g., Greek Alphabet,
Math symbols 1, Accents.

(3) Case.  Upper, Lower.  Differs only for letters in alphabets.

(4) Style.  Light, Bold, Italic, Bold Italic, Demibold, etc.

(5) Font.  Caslon, Elzevir, Times Roman, Lptfont, Datadiscfont,
JohnDoefont.

(6) Size.  Measured in Points.  The P language has point-pica-inch
conversion primitives.

(7) Orientation.  Upright or some other angle between 0 and 360
degrees.

(8) Thickness.

(9) Texture.

(10) Color.
.SS THE DEVICE SPECIFICATION

A "Device Specification" file must be fed to the system along
with the manuscript and the Glyph Map.  Conceptually, the
Device Specification defines a printing or viewing device
as a set of attributes such as RASTERSCAN, 200PPI, 2FONTS,
NOGRAYSCALE.  Actually, the file is a collection of MLISP
DEFPROPs and procedures through which the FORMATTER,
PAGINATOR, and POLISHER programs filter the manuscript to
obtain a document that can be processed by the PRINTER/VIEWER
program for the specified device.

Keeping such procedures on a separate file (usually in LAP
form for efficiency) keeps the kernel system small even when
new devices are added to its capability.

The PRINTER/VIEWER program and the Device Specification File
are provided by each installation for each of its devices.
It may be possible in some cases for an installation to
use a single P/V and Device Spec for several devices.  In such
a case, a single document file could be printable on all of them.
.SS THE FORMATTER

The FORMATTER program is similar to the PARSER and FILLER
modules of PUB.  The PARSER is replaced by the MLISP compiler
and the LISP system.  The FILLER is replaced by modules for
text, math, line-drawings, and images.  The pagination
capabilities of PUB are intentionally omitted to simplify
the FORMATTER and to allow more complex capabilities to be
handled by the PAGINATOR program.

During operation of the FORMATTER, the author can monitor
its progress on a terminal, interrupt it at landmark points,
and interact with it at breakpoints and error points.

The FORMATTER may generate tables of contents, indices, etc.
in manuscript format as in PUB.  If it does, it swaps in an
ALPHABETIZER program to sort the indices. Then the FORMATTER
is swapped back in to process the generated portions.

A hyphenation capability is included in the text module
for those who like it.

The manuscript is structured into one or more portions,
each of which may be divided into fragments.  Non-global
declarations are local to portions and to fragments (unlike PUB).
Thus, it is possible to format fragments independently.  The system
will remenber the states of the few counters and other variables
at the end of each fragment that affect the processing of the next
fragment.
.SEC GALLEYS

The FORMATTER outputs two files called the "galley" and the "galley
guide" (analogous to the PUInS.PUI and the PUIn.PUI files of PUB).

The galley contains text, drawing directives, and image directives,
with sufficient information so that the Printer/Viewer program
can display it provisionally justified but not paginated.  There
is a single column for each section.  Footnotes and diagrams appear
close after the text which references them. Cross-references are not
resolved.

The galley guide is an abstract of the galley in which content is
omitted, size information is elaborated, and pagination directives
are carried forward.  The galley guide contains sufficient information
for the PAGINATOR program to lay out the document into pages, areas,
boxes, and columns.
.SS THE PAGINATOR

The PAGINATOR Program does not input the galley but only the
galley guide.  It essentially juggles rectangles and possibly
other shapes to fit them into pages, areas, and columns,
keeping groups together, placing footnotes below their
referents, and keeping figures near the texts that describe
them.

The PAGINATOR needs to know device specifications but nothing
about glyphs.  It also needs to know the author's pagination
directives from the manuscript.  These can all be found in the
galley guide.

The principal output of the PAGINATOR is the "Page
Guide".  This is probably in the same format as the Galley Guide,
but its content is sorted, structured, and pruned.

Whenever the PAGINATOR completes a page, it writes all cross-
reference labels that appeared on that page onto a file called
the "Cross-Reference Table" (CRT? -- no, XRT!).
.SS THE POLISHER AND THE DOCUMENT

Some Printer/Viewer programs may have the sophistication to be
able to input the galley, the page guide, and the XRT and
display a finished document (see dotted line in Figure 1).
However, the normal procedure is to feed them to the POLISHER
program which produces a well-ordered "document" file in which
pages are together and cross-references are resolved.  This
file is easily handled by the P/V.
.SS THE PRINTER/VIEWER

This device-dependent program can print either the galley or
the polished document, becuase both files are in the same
format. 

For raster devices, the P/V may have two passes.  One
generates bit matrices from vector/text representations, while
the other actually prints the matrices.

The P/V program may be parametric at the option of the installation.
In certain cases, it may be possible to substitute certain fonts for
others, to change the resolution specification, or to select certain
pages for output.

The P/V is the only program that looks at the actual images
of glyphs.  These glyphs are in a form appropriate to the
device, e.g., octal code, bit matrix, vector outline.  The
actual image is normally computed from a contour representation
extracted from the Registry.
.SEC THE REGISTRY

There is a Network Registry of Glyphs as well as local
registries.  A document referring to a local registry
can not be transmitted over the Network.  Use of local
registries should be limited to storing new glyphs that
have not had an opportunity to be registered in the
Network Registry.

The Registry consists of a Glossary and a Directory.

The Glossary lists the available Sets, Cases,
Styles, Fonts, and so forth.  There is a procedure
for adding new entries to the Glossary, e.g., the
Russian alphabet to the Set Glossary or Clarendon to
the Font Glossary.  It is also possible to add new
characters to existing incomplete sets.

The Directory lists every Glyph File registered by
a participating installation, including its coordinates
in the sparse array, complete file name, and site name.
The coordinates must be use the terminology of the
Glossary.

It is not permissible to change a glyph file once it has
been registered in the Directory.

A font book will be published periodically to help people
find what they need in the Registry.
.SS GLYPH FILES

Each Network Glyph File defines a set of glyphs.  The file header
contains geometric information needed by the FORMATTER and
POLISHER programs, such as height, width, kerning profiles,
and transformation clues for changing scale, orientation,
and thickness.  The remainder of the file contains a curved
contour representation of each glyph.

Each local installation is expected to have its own GLYPH
CONVERTER to generate local glyph files (see Figure 2).
The headers are simply copied from Network Glyph Files,
possibly changing scale, orientation, and thickness.  The
contours are converted to bit matrices or vector outlines
as appropriate.

In the case of trivial devices such as line printers, trivial
glyph files should be produced by the installation.  However,
it is important to stay within the framework of the registry.
For example, if the LPT has an integral sign, it should be
specified in the glyph map as, say, "math-set 63" rather than
as "latin-set 14".  The local math-set glyph file would then
specify that glyph 63 is really octal 14 on the LPT.  Other
glyphs in the local math-set file would have no good
representation on the LPT.
.SEC THE MLISP EXTENSION

If the separate FORMATTER/PAGINATOR organization is followed,
several simple changes to MLISP will be made:

(1) Contraction.  Some features that would be useless to the
system and to most authors will be removed in the interest of
saving space.  Authors needing these features could LAP them in.

(2) Macros.  The MLISP "DEFINE" only replaces one token by another.
Macros in P must be able to replace either an identifier or a
sequence of delimiters by an arbitrary sequence of tokens.
Invisible tokens such as spaces, tabs, and line boundaries must
be recognized as tokens in text expressions of P.

(3) Strings.  The LISP string facilities are different in every
system and inadequate in all.  P will have its own string package
with a few primitives to be encoded in LAP for each object machine.
A string will be a series of glyphs; thus, the package would compute
widths and heights of text units such as words at high speed.

If a combined FORMATTER/PAGINATOR organization is followed, the
system will be written in LISP70 to take advantage of backtracking,
syntax-directed computation, coroutines, and edit strings.
Macros will have to be added to the scanner.
.SS ADVANTAGES AND DISADVANTAGES OF MLISP

Among the advantages of an MLISP implementation of the new
system are:

(1) Efficiency.  The language will be processed by an extension
of the existing MLISP compiler, which translates at 3000
lines per minute, more than three times faster than PUB
Pass One.  Most PUB macros could be procedures (EXPRs and
FEXPRs) in P, so their execution will be several times faster
than in PUB (PUB spends much of its time expanding macros).

(2) Flexibility.  Author procedures could directly call
or redefine procedures in the system.  During debugging,
the author could set breakpoints and perform traces.

(3) Portability.  The extended MLISP compiler will be written
mostly in STANDARD LISP, so that it will be transportable
to new installations with a minimum of effort.

The system should run equally well (except for speed differences) in
LISP1.6, TENEX-LISP, ILSP, MACLISP, and LISP70.  With a small amount
of LAP programming, it should run in LISPs on other computers than
the PDP-10 as well.

Disadvantages of MLISP are:

(1) Size.  The LISP1.6 version of the FORMATTER will probably be
nearly as large as PUB Pass One, becase of LISP and MLISP overhead.
This will be remedied when LISP70 is operational.

(2) Inefficiency.  The PAGINATOR and POLISHER may be simple enough to
be programmed in machine language at a substantial gain in efficiency.
This may be done after portable LISP versions are operational.

A LISP70 implementation will be of comparable efficiency.  Backtracking
will tend to slow it down while data type declarations will tend to
speed it up.  It will be portable because LISP70 is portable, and
flexibility will be improved because of the extensible nature of the
language.
.SEC STANDARDS

The following file formats shall be standardized:

.BEGIN VERBATIM
(1) Individual Documents

	a. Manuscript.
	b. Galley and Document (same format).
	c. Cross-Reference Table.
	d. Galley Guide.
	e. Page Guide (similar to d?).

(2) Registry

	a. Glossary
	b. Directory
	c. Glyph File Header
	d. Curved Contour Representation
.END

The following programs shall be written in portable fashion:

(1) FORMATTER

(2) PAGINATOR

(3) POLISHER
.SEC REALIZATION

Manuscript and Registry standards shall be proposed by
Palo Alto and Galley and Document standards by Pittsburgh.

The FORMATTER shall be programmed by Rich Johnson and
Brian Harvey with assistance by Larry Tesler.

The PAGINATOR and POLISHER shall be programmed at CMU.

MLISP extensions shall be made at Stanford.
The ILSP implementation will be maintained by CMU, the
LISP1.6 (and later LISP70) implementations by Stanford,
and the TENEX-LISP implementation by Xerox.

If LISP70 is used, Stanford will maintain the system.

Each installation shall provide its own glyph converters,
text editors, device specifications, and printer/viewers.
However, the possibility of collaborating on XGP service
should be explored as the project proceeds.  CMU shall
be the motivating force and shall do most of the programming.

A target date of August 15 is suggested for a first version
of the system.  Although only a subset will be implemented
in the first version, the framework for supplying the
remainder must be provided.

This optimistic estimate is based on the fact that PUB
was completed in six months by one person in an
inappropriate language.  The new implementation is simplified
by separating pagination from filling and by building on
an existing compiler.  Although the new system has many
sophisticated facilities, they have all been done before in
some form by some of the implementors.
.SEC PROPOSAL FOR GRAPHICS LANGUAGE
by Robert Sproull

This is an editor's abstract of a typewritten document.

A document is composed of "boxes" with geometry, marked where
page breaks can occur.  Each box has a "body" and "i.d. info".
The body has printing rules.  The i.d. has names for
subtitling and positioning relative to other boxes,.
Processing within each box is independent, allowing for
incremental compilation of a document.

LISP procedures are more useful than macros, e.g., to specify
line drawings in the graphics section.

Line-drawing primitives are suggested: absolute/relative point/line,
line or curve with thickness and texture, string (caption),
device-dependent code.

Floating-point coordinate system chosen by user.

Curves in terms of endpoints and control points.  Latter not
necessarily on the curve, but guide fitter.

Program must be able to interrogate the state, including questions
like "How many inches would a vector of length dx,dy occupy?".
Other questions: resolution, string dimensions, aspect ratio.

A display procedure (cf. Newman, CACM) has arguments, prog variables,
and also a "master rectangle" within which it can draw.  A display
procedure call may optionally specify the instance rectangle, as well
as location, rotation, scale, and transform matrix.  The
system automatically applies these transformations from the user's
coordinate system to the page.

Display procedure calls draw within a "box" of given size as
mentioned earlier.
.SEC FIGURE 1
.VERBATIM
BLOCK DIAGRAM -- PART 1 OF 2

               +++++++++++
              |  SCRIBBLE |
               +++++++++++
                    |
                    ∨
               ***********
              |TEXT EDITOR|
               ***********
                    |
                    ∨
               -----------
              |           |
              | MANUSCRIPT|
              |           |
               -----------
                    |
                    |<-------------------------
 +++++++++          ∨                          |
|         |    ***********      ************   |
| MONITOR |<--| FORMATTER |--->|ALPHABETIZER|--
|         |    ***********      ************
 +++++++++     |        |
               ∨        ∨
    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |
   |            |  |          |
    ------------    ---------- 
.NEXT PAGE
BLOCK DIAGRAM -- PART 2 OF 2

    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |-----
   |            |  |          |     |
    ------------    ----------      |
             |                      |
             ∨                      |
            ***********             |
           | PAGINATOR |            |
            ***********             |
             |       |              |
             ∨       ∨              |
     -----------   -----------      |
    |           | |   CROSS   |     |
    |   PAGE    | | REFERENCE |     |
    |   GUIDE   | |   TABLE   |     |
     -----------   -----------      |
             |           |        -----
             |           |       |     |
             ∨           ∨       ∨     |
             ---------------------     |
                 |             .       |
                 ∨             .       |
            ***********        .       |
           |  POLISHER |       .       |
            ***********        .       |
                 |             .       |
                 ∨             .       |
            -----------        .       |                  +++++++++
           |           |       ∨       ∨   *********     |HARD COPY|
           |  DOCUMENT |------------------| PRINTER |--->|   OR    |
           |           |                  | /VIEWER |    | DISPLAY |
            -----------                    *********      +++++++++
.SEC FIGURE 2
GLYPH CONVERTER

         ----------
        |          |
        | REGISTRY |
        |          |
         ----------
             |
             ∨
         ***********
        | CONVERTER |
         ***********
          |       |
          ∨       ∨
 ------------    ---------
|  GLYPH     |  |  GLYPH  |
|DESCRIPTIONS|  | IMAGES  |
 ------------    ---------
.FILL
.STANDARD BACK