perm filename OVERVU.PUB[L70,TES] blob sn#025979 filedate 1973-02-15 generic text, type T, neo UTF8

Lawrence G. Tesler
Horace J. Enea
David C. Smith

Stanford University

February, 1973
.SKIP 3⊃
.SKIP 3⊃
.MACRO EB ⊂ E ; B ⊃
.EVERY HEADING({PAGE!},,|Tesler, Enea, and Smith|)

LISP70 is an extensible and portable sucessor to LISP.  It is compiler-based
rather than interpreter-based, and emphasizes pattern-directed computation.


This is a limited circulation draft of a paper to be submitted to a
conference or journal.  No right to publicize its contents is granted.

During the past decade,
LISP [{REF LISP}] has been a principal programming language for artificial
intelligence and other frontier applications of computers.
Like other widely used languages, it has spawned many variants,
each attempting to make one or more improvements.
Among the aspects that have received particular attention are notation
[{ref ISWIM}, {ref MLISP_ENEA}, {ref MLISP_SMITH}, {ref LISP2}],
control structure
interactive editing and debugging [{ref DWIM}], and execution efficiency.

A need for a successor to LISP has been recognized [{ref BOBROW}], and
several efforts in this direction are under way.  The approach taken by
BBN-LISP is to begin with an excellent debug-oriented system [{REF BBN_LISP}]
and to add on flexible control structure [{ref BOBROW_WEGBREIT}] and Algol-like
notation [{REF CLISP}].
The approach taken by LISP70 and by the LISP-related
ECL [{REF WEGBREIT}] is to begin with an extensible kernel language which
users with a variety of needs can tailor and tune to their own needs.

"Tailoring" a language means defining facilities which assist in the solution
of particular kinds of problems which may have been unanticipated by the
designers of the kernel.  "Tuning" a language means specifying more efficient
implementations for statements which
are executed frequently in particular programs.

A language that can be used on only one computer is not of universal utility;
the ability to transfer programs between computers increases its value.
However, a
language that is extensible both upward and downward
is difficult to transport if downward extensions mention
machine-dependent features [{REF DOWNWARD}].  LISP70 generates code for an
"ideal LISP machine" called "ML" and only the translation from ML to object
machine language is machine-dependent.  Thus, downward extensions can be
factored into a machine-independent and a machine-dependent part, and during
program transfer, the machine-dependent recoding (if any) is clearly isolated.

Ease of inter-machine transfer is called "portability" and flexibility of
language facilities is called "extensibility".  These two aspects of LISP70
are discussed at length in [{REF LISP70_SYMPOS}].

The present paper treats a wide range of subjects briefly:
why LISP70 is a compiler-based system; its
control structure; streaming; MLISP notation;
extensible functions; automatic ordering of pattern rewrite rules;
These areas have been selected for discussion either
because they are not well known or because they appear to be
controversial among workers in the field.

Among the improvements to LISP that are not discussed are data type mechanisms,
dynamic storage allocation, file processing, gremlins, multiple oblists,
inverse functions, generic functions, and memo functions.

Most LISP systems are interpreter-based.  An interpreter is easy to
program (especially in LISP), and powerful debugging aids are easy to
include [{ref DWIM}].  The only disadvantages of an interpreter on most
existing computers are low code density and slow execution.  To
compensate for these faults, most LISP systems offer a compiler to
produce fast code for debugged programs.

When both an interpreter and a compiler are implemented in a single
system, they must be completely compatible to be of any practical
utility.  In an extensible system, the problem is compounded,
because each change to the interpreter must be reflected exactly in
the compiler.

LISP70 is intended to be a production system, not a research project.  To
achieve maximum performance, it is compiler-based.
Every attempt has been made to preserve all the advantages of an
interpreter without sacrificing the speed of a compiler.  Some of the methods are
discussed in the ensuing paragraphs.

When a LISP function is defined or redefined in the system, it is not immediately
compiled.  To avoid compiling a whole program just to find an early bug,
compilation of each function is deferred until the first time it is invoked

The compiler is resident to avoid a loading process at each use.  It is made
fairly fast and compact by extensive use of streaming techniques, which will
be discussed later in this paper.

The debugging facilities of an interpreter are simulated by use of
"interception" and "source correspondence", as follows.

Every function call and data element access, as well as user-specified variable
references, are implemented by indirect addressing or its equivalent
on the object computer.
By altering the contents of the indirect cell, a jump or trap to the debugger
can be caused to intercept each reference to a monitored entity.

Source correspondence lets the debugger determine at an execution error
or breakpoint the corresponding point in the original source program.
The correspondence is not remembered for every instruction, but only for
landmarks such a LAMBDA or a call on EVAL.  At each
landmark, the compiler remembers the source code and object code positions as well as
environment information such as the contents of the stack.

To determine the correspondence of other points then landmarks, the
compiler is initialized to the next earlier landmark and code is recompiled
until the desired object instruction is generated.
At that point, the compiler knows what is on the stack and in each register as well as other
information vital for debugging.  The time taken to regenerate the environment
for a particular instruction is negligible in an interactive mode of operation
such as occurs during debugging.

The LISP function EVAL(EXP) is supposed to evaluate the S-expression EXP
in the current environment.  Having no interpreter, LISP70 compiles EXP and
executes the object code.  The compiler is called by
COMPILE(EXP,ENV), where ENV is an "environment descriptor" referring
to the compiler's landmark associated with that call on EVAL.

The kernel LISP70 system uses a restricted control structure to achieve
maximum performance in simple applications and in the interpretation of advanced
control structures for more complex applications.

The only multiple processes that are allowed in the kernel are independent
coroutines.  Although they may share propertyy lists and "common"
variables, their ALISTs are completely separate.  Coroutines are
useful in synchronous applications such as monitoring, streaming,
and macro-compilers, but not in asynchronous problems such as systems

The kernel has choice, failure, prune, and suspend primitives which
operate on a context stack.  A choice function creates a branch from the
current context and makes that branch be the new current context.
Failure removes the current context from the stack, makes the parent context
current, and restores the states of all processes and
data structures to what they were last time this context was current, except
for specifically exempted entities.
Pruning removes non-current contexts from the stack when they are no longer needed.
Suspending permutes the context stack so that the current context is suspended
and an ancestral context becomes current to try alternative choices.
If these choices prove unsatisfactory, the suspended context can be resumed.

Assignments, fetches, and property list operations can be done "in context" to
survive failures or to communicate information among cooperating contexts.
Out of the primitive contextual operations, a variety of useful control
functions can be defined.

The context stack scheme runs considerably faster for most
purposes than context-tree systems [{REF QA4}] because
LIFO storage management avoids costly garbage collection of rapidly generated
tree branches, and because fetches are not slowed down by
context-guided searches.
The scheme only becomes inferior when suspended search is a primary technique,
as in "multiple world" planning programs: frequent permutation of the context stack
can be a very expensive operation.
The implementation is based on [{REF MLISP2_BACK}], with the addition of
fetches in context, generalized pruning, and suspended search.

It is well-known how to achieve a variety of interesting control structures
-- including context trees --
utilizing the "activation record" scheme [{REF WEGNER}, {REF BOBROW_WEGBREIT}].
LISP70 offers convenient facilities for designing activation records and for
defining primitives to build control structures out of them.  Furthermore,
a multi-level tabular compiler organization [{REF LISP70_SYMPOS}] allows
specialized efficient code to be generated for the most frequently executed
control operations identified by the designer.

A possible research project for someone interested in control structures
would be a program that takes a control interpreter operating on ALISTs
and produces an equivalent activation-record-based or stack-based interpreter.
The latter interpreter could then be added to BBN-LISP or could be used to
generate compiler extensions for ECL or LISP70.

Much of the processing performed by computers is sequential, including
serial input-output, list scanning, and character string parsing.  To
simplify the description of sequential processing, LISP70 features a
"streaming" facility.  Any sequential data structure (such as a text file,
a list, or a string) may be treated as a "source" or as a "sink".  A
"source pointer" may be pointed at any element of a source, and a
"sink pointer" may be pointed at the last element of a sink.  Data may be
"streamed" from a source beginning at the element designated by the
source pointer, and data may be "streamed" to the end of a sink.

The primitives important in streaming are the functions SOURCE_POINTER,

If S is a sequential data structure, then SOURCE_POINTER(S, E) produces a
source pointer pointing to the first element of S, while SINK_POINTER(S, E)
produces a sink
pointer pointing to the last element of S.  The argument E is used to define
the nature of an "element".  For example, if S is a text file, it may
be regarded as a series of characters, as a series of tokens, or as a
series of S-expressions.  If the argument E is omitted, then the elements are
assumed to be S-expressions.

If SP is any pointer, then THIS(SP) returns the element at which SP points.
If SP is a source pointer in particular,
then NEXT(SP) returns THIS(SP) and advances SP one
element to the right.
If SP is a sink pointer, then NEXT(SP)←X lengthens the sink by one element,
makes SP point to the new last element, and makes that element be X.

If SP points past the end of a source, then THIS(SP) returns the special
value EOF (end of stream) and NEXT(SP) calls FAILURE().
If SP points to an empty sink, then THIS(SP) returns the special value BOF
(beginning of stream).

A pointer may be pointed at a process.  In that case, NEXT activates the
process until it outputs a "source element" or accepts a "sink element".

Backtracking restores the positions of all pointers and the states of all
sources and sinks except those specifically exempted by the programmer.

The following MLISP program copies the file A to the end of the list L, omitting all
instances of the number 4 from the copy.
It uses the predicate SUCCEEDS, which returns FALSE
if the evaluation of its argument
calls FAILURE(), and otherwise returns non-FALSE:

Notation is not the most important aspect of a programming
language.  With poor notation it may be difficult to read and write
programs, but with inadequate control structure, programming can be
nearly impossible.  Yet notation is important, and deserves attention
just as the form of a building receives attention after its structural
soundness is assured.  We have all seen or written programs which did
wonderful things, but which no one could read -- with the possible exception
of their authors.  Good notation can make programming errors both less likely
and easier to spot, and can make programs a good deal shorter and more easily
managed by person and machine alike.

LISP70 deals in two basic notations:
the Algol-like M-expressions of MLISP [{REF MLISP_ENEA}, {REF MLISP_SMITH}],
and fully parenthesized S-expressions.  S-expressions are used at
several language levels: LISP, ML, and LAP.  LISP is the principal language
of the system,
easily manipulated as data and easily compiled.
ML is the imaginary order code of a Burrouque stack machine, to which LISP is
translated by the compiler.  LAP is an assembly language for the object
computer, generated from ML by machine-dependent tables.  User-written
programs are normally in MLISP, and are translated to LISP by the compiler
as soon as they are input.

If the user prefers a different top-level notation than either MLISP or
LISP, it is possible to replace the MLISP-to-LISP translation tables by
tables for a different langauge such as Algol.  This could also require the
addition of new semantic primitives such as block structure.  After designing
the implementation, the user would add new rules to the LISP-to-ML tables and
if necessary to the ML-to-LAP tables to handle the new primitives.  Any
intrinsic functions that were needed would be written in MLISP like the rest
of the LISP70 system.

It is possible to implement a multi-language multi-computer compiler
organized as in the following diagram:
MLISP		QA-4		ALGOL-60	other

  ↓		  ↓		    ↓		 ↓
  ↓		  ↓		    ↓		 ↓


  ↓			 	↓

  ↓			 	ML

  ↓			 	↓
  ↓		  ↓		   ↓		  ↓
		  ↓		   ↓		  ↓
B1700		PDP-10		IBM 360		other
For example, ALGOL-60 could be translated to an appropriately extended LISP,
the LISP could be translated to ML, and ML could be translated to the
machine code of the IBM 360.  In the case of a microprogrammed
machine like the B1700, or in the case of an interpreter-based implementation,
the LISP code could be executed directly without going through ML.

The use of the intermediate machine-independent language ML might appear to
decrease the efficiency of the object code and of the compiler.
Brown [{REF LOWL}] has shown that with proper design a penalty of only
2.5% to 6% in extra object instructions need be paid for the intermediate
use of a low-level language.  LISP70 further enhances efficiency by use of
a machine-dependent peephole optimizer in critical portions of a program.
To minimize
compiling overhead, LISP70 "pipelines" the translation steps as described
later in this paper.

The semantics of rewrite rules do not require that decomposition and recomposition
proceed from left-to-right,
nor even that decomposition finish before
recomposition begins.  The implementor is allowed
some latitude in choosing matching and generation algorithms. Left
recursion difficulties can be avoided by right-to-left processing, by look-ahead,
or by iteration, and special cases can be recognized by the compiler and optimized.
There are excellent topics for research projects in this area.

In most instances, strictly left-to-right top-down processing
with backtracking is both sufficient
and easiest to implement.  If the compiler determines that this is the correct
approach, it can translate each rewrite rule to an equivalent algorithm
[{REF LISP70_SYMP}] using the streaming facilities of LISP70.

When streaming is employed,
the input stream is scanned as a source
and the output stream is filled as a sink.
The DEC pattern works from
left to right advancing the source pointer as it matches the input,
and the REC pattern
works from left to right advancing the sink pointer as it generates the output.

The source and sink can be any sequential data structure
or process.  Alternatively, the source can be an argument list passed to the
extensible function and the sink can be the result returned by the function.

It is possible to set up a pipeline of coroutined
processes as diagrammed below [cf. {REF KAY_PIPE},
 -----	 -----	 -----	 -----	 ---------   ------
|MLISP| |MLISP| |LISP | | ML  | |         | |      |
|text |→| -to-|→|-to- |→|-to- |→|   LAP   |→| code |
|file | |LISP | | ML  | | LAP | |assembler| |vector|
|     | |tran.| |tran.| |tran.| |         | |      |
 -----	 -----	 -----	 -----	 ---------   ------
The source of each translator is the adjacent process or file on its left, and its
sink is the adjacent process or vector on its right.  By streaming data directly
from process to process, most intermediate results can be stored on stacks and the
high cost of list-structure garbage is avoided.  It is largely by this means that
the penalty for using a four-step translation is minimized.

Many of the design decisions of LISP70 are contrary to trends seen in other
"successors to LISP".  The goals of these languages are similar, but their
means are often quite diverse.

In interpreter-based systems, speed is sacrificed to gain flexibility
and debugging power.  However, these
advantages can be made available in a compiler-based system
without a large loss of efficiency as described in this report.

Some applications demand complex control structures, but most
applications do not, and LISP70 does not make them pay the price.
Extension packages will be provided for advanced control structures,
and sophisticated users can define their own.

Concern with good notation does not have to compromise the development
of powerful facilities; indeed, good notation can make those facilities
more convenient to use.

Emphasis on pattern-directed computation does not overly constrain the
programmer accustomed to writing algorithms.  Rewrites and algorithms can
be mixed, and the most appropriate means of defining a given function can be
selected.  LISP70 does not limit the use of pattern rewrite rules to
a few facilities like goal-achievement and assertion-retrieval; a set of
rules can be applied to arguments like any other function, and can
stream data from any type of structure or process to any other.

Automatic ordering does not prevent the programmer from seizing control,
but allows him to relinquish control to a procedure of
his choosing to save him tedious study of an existing program when making

The LISP70 kernel is being debugged on a PDP-10
at the time of this writing (February, 1973).
The language has already been used successfully
in programs for question-answering and planning.
After the kernel can reliably compile itself, extensions are planned to
improve its control structure, editing, and debugging capabilities, and
versions will be bootstrapped to other computers.