perm filename ENTITY.PUB[L70,TES] blob sn#009928 filedate 1972-06-27 generic text, type T, neo UTF8
00400	.BEGIN
00500	.CENTER
00600	.SKIP TO LINE 20
01400	DECEMBER, 1971
01500	.END
01700	.INDENT 5,0
02000	.TURN ON "↓_[]#"
02200	.	SKIP 2 ; SNAME ← "N" ⊃ ;

00200	.B
00300	LISP70  is intended to be an improvement on LISP 1.6 in the following
00400	ways:
00600	/A/	Extendability.		Programs may be written in  SEXPR-70,
00700	which is compatible with LISP, or in MEXPR-70, an Algol-like notation
00800	that may be extended or modified by the user to suit  his  notational
00900	preferences.   The concept of "entity" subsumes the concepts of atom,
01000	list, variable, and constant.  Every entity has  a  "type"  property,
01100	and depending on the type, may have other properties as well, such as
01200	"name", "value", "car", etc.  Various access methods can be  used  to
01300	find  and  process entities, depending on their scope, type, or other
01400	features.  The repertoire of  access  methods  and  types  is  easily
01500	augmented  to  improve both notational convenience and implementation
01600	efficiency.  All levels of the system are written in a parametric and
01700	flexible  manner  to enable extensions and modifications to be easily
01800	made.  Machine-dependent portions of the system are clearly  isolated
01900	to assist in the implementation of LISP70 on other computers.
02100	/B/	Backtracking.		In problems involving exploration  of
02200	a  problem  tree  or  maze, primitives are provided to save, restore,
02300	delete, and re-order states of the search.  From these primitives,  a
02400	variety  of depth-first and breadth-first search functions are easily
02500	defined.
02700	/C/	Pattern Matching.	Programs can be driven  not  only  by
02800	recursive functions but also by pattern rewrite rules.  A rule can be
02900	activated by referring to its name or by a standard  or  user-defined
03000	finding  algorithm.  These facilities are extremely useful in parsing
03100	and theorem proving.
03300	/D/	Processes.		The FUNARG device is  generalized  to
03400	allow  coroutines  with OWN variables that are preserved from call to
03500	call.  Normal functions are the special case of processes with no OWN
03600	variables that are created when called and destroyed when exited.
03800	/E/	Storage Allocation.	Storage   spaces  for  each  type  of
03900	entity (car-cdr, atom, binary program,  local  variables,  etc.)  are
04000	allocated  dynamically.  System routines are shared in upper segment.
04100	Thus, a job can start in a small core image and adjust  its  size  as
04200	needed.  Descriptors of functions and of large data structures have a
04300	presence bit so their referents can be swapped in and out by segments
04400	or by pages.
04500	.E

00200	LISP70 is to some extent a software emulator.  When you load LISP70,
00300	you can specify the system you would like to emulate (e.g., LISP 1.6,
00400	BBN-LISP, GEDANKEN, MLISP-2, QA-4, PLANNER) and a package of macros,
00500	rewrite rules, functions, and other definitions which emulate that system will be loaded --
00600	assuming someone has written them!  The point is, LISP70 is flexible
00700	enough to emulate all these, as well as Algol 68, PL/1, and even
00800	Fortran.
01000	Emulation is accomplished by having a very small yet general and
01100	efficient base language plus extendability at every level of the
01200	system.  This is a tall order, but thanks to lots of research in this
01300	area (Enea & Smith, Kay, Mitchell, Fischer, Reynolds, Earley, etc etc)
01400	it can now be done.
01600	The system is written in its own language, so even the base language is
01700	easy to change; however, it is probably general enough for most of us.
01800	A summary of that language is in order.
02000	LISP70 data is comprised of ↓_entities_↓, which in turn are made up of ↓_cells_↓.
02100	Each cell of an entity has two parts, a ↓_selector_↓ and a ↓_contents_↓.  The
02200	selector of each cell must be different from the selectors of all the other
02300	cells in the same entity, because it is used to identify the cell.
02500	The contents of a cell always points to an entity, and the selector also
02600	points to an entity.  An example of an entity with three cells is the
02700	following, which represents the LISP dotted pair (U.V):
02800	.B
02900		(U.V)
03000			 Selector  Contents
03100			 ------------------
03200			|  TYPE  |  PAIR   |
03300			|------------------|
03400			|   CAR  |   U     |
03500			|------------------|
03600			|   CDR  |   V     |
03700			 ------------------
03800	.E
03900	This diagram represents an entity that has three cells.  They are drawn one
04000	on top of another, but their order is really irrelevant.  The top cell in
04100	the diagram has the Selector TYPE and the Contents PAIR.  Actually, TYPE
04200	is represented by a pointer to the entity TYPE and PAIR by a pointer to
04300	the entity PAIR.  The second cell has the selector CAR and the Contents
04400	U.  The third cell has the Selector CDR and the Contents V.
04600	If E is an entity such as the one above, and it is desired to examine the
04700	Contents half of its CAR cell, one can simply say (CAR E).
04900	Sometimes one would like to alter the Contents half of a cell.  To
05000	place a pointer to X in the Contents half of the CAR cell of E, say
05100	(PLACE (CELL CAR E) X).  The way this works is as follows.  PLACE is
05200	a function whose first argument is an entity whose type is CELLPOINTER.
05300	The function (CELL CAR E) produces as its result a Cell Pointer to the
05400	CAR cell of E.  That Cell Pointer looks like this:
05500	.B
05600		(CELL CAR E)
05700			 Selector  Contents
05800			 --------------------
05900			|  TYPE  |CELLPOINTER|
06000			|--------------------|
06100			|SELECTOR|   CAR     |
06200			|--------------------|
06300			| OWNER  |    E      |
06400			 --------------------
06500	.E
06700	It was said earlier that the two halves of a cell always point at an entity.
06800	It is not possible for either half
06900	of a cell to point directly at an individual cell within an entity.
07000	To circumvent this quite intentional restriction, Cell Pointers are
07100	provided.  A Cell Pointer effectively points at an individual cell, by
07200	specifying the cell's Selector in its own SELECTOR cell and by specifying
07300	the entity that contains the cell in its own OWNER cell.
07500	It is easy to find out
07600	what cell a Cell Pointer points to.  If C is a Cell Pointer, then
07700	(SELECTOR C) is the Selector of the cell it points to, and (OWNER C)
07800	is the entity that contains the cell.  To alter the Contents of that
07900	cell to become X, write (PLACE C X).  To discover the Contents, write
08000	((SELECTOR C)(OWNER C)), or the equivalent built-in function, (CONTENTS C).
08200	Note that (CONTENTS (CELL CAR E)) = (CAR E).  This very important
08300	identity applies for any cell, not just (CAR E).  In fact, (CAR E)
08400	is defined as (CONTENTS (CELL CAR E)), which is considered more
08500	primitive by LISP70.  The rule is as follows: if S is not a function,
08600	then (S E) is an abbreviation for
08700	(CONTENTS (CELL S E)).  Functions are excluded because
08800	although they too are entities and thus can be used as selectors,
08900	an ambiguity is created when they appear in the first position of an S-expression.
09100	Suppose an entity S represents a one-dimensional array or "sequence".
09200	The selectors of its elements are the integers 1, 2, ... up to (LENGTH S).
09300	Then to access its third element, write (3 S).  To change its third
09400	element to 7, write (PLACE (CELL 3 S) 7).
09600	Now suppose an entity is to represent a two-dimensional array.
09700	This could be represented by a sequence of sequences, and element [3,4]
09800	accessed by ((3 S) 4) and changed by (PLACE (CELL (3 4) S) 7).
09900	Note that the CELL function in this case makes a Cell Pointer whose OWNER
10000	is S but whose SELECTOR is the list (3 4).  Thus, Cell Pointers can
10100	serve as general "locatives", "references", or "indirect addresses" of
10200	cells deep within data structures.

00200	Dottted pairs obviously fit quite well into the entity/cell scheme.  Property
00300	lists are just as easily represented.  Instead of using an (attribute.value)
00400	list as is done in LISP, it is more efficient to make a property list be
00500	an entity with cells selected by property indicators.  Thus,
00600	(GET N (QUOTE IND)) would be accomplished by (IND N), and
00700	(PUTPROP N V (QUOTE IND)) by (PLACE (CELL IND N) V).  LISP70 selects cells
00800	in property lists by hashing, which is faster than list searching.
01000	One might object that hashing is fine for property lists but terrible for
01100	evaluating (CAR E).  This is quite true, and LISP70 allows the method of
01200	CELL access to work differently depending on the types of the selector and
01300	entity involved.
01500	CAR and CDR are selectors of type "field", and dotted pairs
01600	are entities of type "record".  When CELL is presented with a field and a
01700	record, it utilizes the function INDEX to index in a block of consecutive storage.
01800	Properties are selectors of type "indicator", and
01900	property lists are represented by entities of type "node" (they are not called
02000	"atoms" because "atom" is the name of a predicate which means "not a dotted
02100	pair").  When CELL is presented with an indicator and a node, it utilizes
02200	the function PROPERTY to hash in a small table.
02400	To know what to do
02500	with each combination of types, CELL uses rewrite rules that are part of the
02600	universal evaluator (EVAL).  Schematically, these rules read like this:
02800	However, to be precise, one must include dummy variables and distinguish
02900	types from functions:
03000	.B
03100		(CELL $FIELD:F $RECORD:R) → (INDEX :F :R)  @ EVAL
03300	.E
03400	Types are preceded by "$".
03500	Dummy variables are preceded by ":".  Functions, constants, and other
03600	invariants of the rewrite rule are not preceded by anything.  The function
03700	(actually the environment) which is to use the rewrite rule is named at the
03800	end, preceded by "@".
04000	The evaluator has similar rules for other combinations of types, and the
04100	user can add new rules for new types or new combinations of interest to
04200	him.  Subscripting in a sequence and looking up a phrase in a lexicon are
04300	important access operations that have built-in rules:
04400	.B
04700	.E
04800	LOOKUP uses a variety of hash optimized for character strings, as opposed
04900	to PROPERTY, whose hash is optimized for uniformly distributed bit patterns
05000	(which is what indicators are).
05200	The rewrite evaluation of CELL allows the kind of generality achieved in
05300	GEDANKEN by functional data structures, while avoiding inefficiency when
05400	unnecessary.  Thus one might make the rule:
05500	.B
05700			BEGIN
05800			I ≤ J →→ (CELL J (CELL I M)) ;
05900			I > J →→ (CELL I (CELL J M)) ;
06000			END
06100		@ EVAL
06200	.E
06300	which for any matrix of type SYMMETRIC (not just one at a time, as in
06400	GEDANKEN), assignment to ((I J) M) also alters ((J I) M).
06500	Of course, rewrites can also be written for individual data structures:
06600	.B
06700			(CELL ($INTEGER:I 0) N) → .....
06800	.E
06900	which not only apllies just to matrix N but also only applies when the second
07000	subscript is 0.

00200	Rewrite rules are used throughout the evaluator.  Conceptually,
00300	LISP70 is interpreted using such rules at every step.  In reality, LISP70
00400	is compiled, and the compiler applies most of the rewrite rules at translation
00500	time.  There is no conflict between evaluator rewrite rules and rewrite rules
00600	that may resemble them in other processes, because rules can be local to
00700	specific environments (indicated above by "@EVAL").
00900	New rewrites can be added to the evaluator even at execution time, but in
01000	that case, some pieces of generated code might be marked INVALID (ala Mitchell) and
01100	recompiled taking account of the new rules next time they are evaluated.
01300	Rewrite rules are somewhat more general than suggested above (and can be
01400	generalized further by the user).  After "$" may appear not only a type
01500	but any predicate.  A type is a special case; any entity T of type TYPE can be used as
01600	a predicate defined by T=λx(type(x)=T).  In the rewrite system, this
01700	capability is described to the evaluator by:
01800	.B
01900		($TYPE:T :X) → (EQ (TYPE :X) :T)  @ EVAL
02000	.E
02100	Notice how important it is to distinguish the predicate TYPE from the
02200	selector TYPE using the "$".  Notice too that no predicate precedes :X
02300	and so any entity will match it.
02500	As further examples of rewrites, a couple from the evaluator will be shown:
02600	.B
02700		(PLUS :X :Y ::Z) → (PLUS (PLUS :X :Y) :Z)  @ EVAL
02800		(PLUS :X) → :X  @ EVAL
03000	.E
03100	The first example uses "::Z", which instead of matching a single entity
03200	can match one or more entities in a row.  If a predicate preceded it,
03300	it would be tested on each of the entities.  This rewrite converts
03400	calls on PLUS with three or more arguments to calls with two arguments.
03500	The second example simplifies calls on PLUS with one argument, and the
03600	third specifies that adding integers is accomplished by the lower
03700	level function IPLUS.  Other rewrites for PLUS of two arguments also
03800	exist for other combinations of types.
04000	Other features of rewrite rules are illustrated by:
04100	.B
04200		(IF :A THEN :B [ELSE :C]:Z) → (COND (:A :B) [if :Z then (T :C)])
04300	.E
04400	In the left hand pattern, "[]" surrounds an optional portion of the
04500	pattern.  If that portion is matched, Z will be set to TRUE, otherwise to
04600	FALSE.  In the right hand pattern, "[]" surrounds an M-expression nested
04700	within an S-expression; the M-expression decides on the basis of Z whether
04800	to include the T clause of the COND.
04900	.B
05000		(WHILE :B {DO|COLLECT}:DC :S) → (WHILEDC :B :DC :S)
05100	.E
05200	Here, alternative patterns are listed between "{}" and separated by "|".
05300	DC is bound to the pattern that worked: DO or COLLECT.

00200	The term "atom" in LISP has unfortunately become
00300	ambiguous.  The predicate "atom" tells whether an S-expression is not
00400	a dotted pair.  This predicate is so ingrained in LISP algorithms that
00500	it would be undesirable to change its name.  The other meaning of "atom"
00600	is a thing that has a property list -- usually obtained sneakily by taking
00700	CDR of the atom.  These two meanings are not entirely compatible, because
00800	although every thing with a property list also satisfies the predicate
00900	"atom", not everything that satisfies the predicate "atom" can have a
01000	property list.  In particular, in most modern LISP systems, including
01100	LISP70, a small integer number does not have a property list.
01200	We have chosen to call entities
01300	that have properties "nodes".
01400	For compatibility, there is a rewrite rule in the evaluator:
01500	.B
01600		(CELL CDR $NODE:N) → (PAIRUP :N)  @ EVAL
01700	.E
01800	where PAIRUP is a primitive that makes the selector-cell pairs of an
01900	entity into a list of entities of type COUPLE.  A couple is like a dotted
02000	pair except instead of a CAR cell and a CDR cell it has a CARPTR cell and
02100	a CDRPTR cell which point at the CAR and CDR cells of the corresponding
02200	node via Cell Pointers.  To make operations on these simulated property
02300	lists really affect the corresponding nodes, we define:
02350	.B
02600	.E
02800	Every entity has a type cell, and depending on the type, other cells
02900	may or may not be permitted.  An entity of type node can have any
03000	number -- and a varying number -- of cells, selected by entities of
03100	type indicator.  An entity of type record has a fixed number of cells,
03200	determined by its "record class", and selected by entities of type
03300	field that are also determined by its record class.  An entity of type
03400	SEQUENCE has a fixed number of cells, one of which is selected by the
03500	field LENGTH, and if (LENGTH X)=N, then the other cells of N are
03600	selected by the integers 1,2,...,N.  An entity of type CELLPOINTER has
03700	two cells besides TYPE: SELECTOR and OWNER.  And so on.
03900	Now we will fit numbers into this scheme.  A number has only a TYPE
04000	field and a NAME field.  The name field conceptually contains a pointer
04100	to a string (actually a SEQUENCE of characters) which is the print name of
04200	the number; thus the entity that represents the integer 2 has the
04300	TYPE integer and the NAME "2".  In reality, numbers are stored in
04400	binary and the operation (NAME N) where N is a number requires conversion,
04500	but this is only a detail of implementation.  Changing the cells of a number
04600	is prohibited by the rewrite rule:
04700	.B
04800		(PLACE $λx(number(owner(x)):N :V) →
04900		  (WARN "Attempt to place :V in ([selector(:N)] [owner(:N)]) ignored.")
05000	.E
05100	On the left side, a predicate is written in full as a lambda-expression.  On the right
05200	side, the ":" and "[]" features are used in character strings
05300	just as they have been used in S-expressions -- to escape
05400	into an M-expression.  This of course makes it tricky to include a ":",
05500	"[", or """ in a character string; to do it, the character must be
05600	preceded by an override character selected by the user.
05800	Next, files will be put into this scheme.  A file is merely an entity stored
05900	on an external medium.  It is perfectly possible to store a list or graph
06000	structure -- property lists included -- on a file, in binary, not in
06100	ASCII.  This is not possible in LISP because atoms are always global -- two
06200	atoms with the same name are normally considered to be the same atom.
06400	In LISP70, nodes (and in fact any entities) can be local to a
06500	certain environment.  A file is such an environment; it can have its
06600	own nodes distinct from nodes of the same name in other environments.
06700	If translation to the name-node pairings of another environment is desired, 
06800	it is easily accomplished by taking (NAME N) for each node N on the file
06900	and looking it up in the name-node lexicon of the desired environment.
07000	This is equivalent to INTERNing in the present LISP, but it avoids character
07100	scanning on the input medium, and permits other structures than S-expressions
07200	(e.g., cyclic graphs) to be represented cleanly on a file.

00200	A variable is simply an entity that has in addition to a TYPE and other
00300	cells a cell selected by the field VALUE.  To evaluate a variable V,
00400	use (VALUE V).  To store in it as in LISP's (SETQ V Z), use
00700	Note that a variable is not the VALUE cell
00800	of an atom, nor a position on the stack, nor the identifier used to
00900	refer to it.  A variable is a unique entity.  The variable named "X"
01000	in one block of a program may be a different entity from the variable
01100	named "X" in another block.  To go from the name to the entity, a
01200	name-variable lexicon is maintained in each environment.  Usually,
01300	the compiler converts names to variables at compile time, but for
01400	EVAL this is not possible, so the tables are kept around at execution
01500	time.
01700	For flexibility and efficiency, the type of a variable can be qualified
01800	by specifying the permissible types of its value and by indicating
01900	the inclusion or exclusion of various additional cells in the variable.
02100	A variable whose VALUE cell always points to an integer would be given
02200	the type "integer variable".  One that can point either to an integer
02300	or to a node would be given the type "{integer|node} variable".  One that
02400	can point to another variable which is itself of type "integer variable"
02500	would be given the type "integer variable variable".  It is also possible
02600	to allow a variable's value to point to any entity at all; in that case,
02700	it is given the type "general variable".
02900	In addition to a TYPE cell and a VALUE cell, a variable may or may not
03000	have other cells.  Every variable that arises from an identifier in a
03100	written program of course has a name, and this appears in the NAME cell
03200	of the variable during compilation and debugging runs (it can be
03300	suppressed in production runs).  Some variables do not have names , e.g.,
03400	"gensyms".  Thus we distinguish a "named variable" from an "unnamed
03500	variable".  Another cell used in debugging is the MONITOR cell, which
03600	may point to a routine to be called whenever the variable's value cell
03700	is changed.  Variables with space for such a cell are called "monitorable
03800	variables" and other variables are called "unmonitorable variables".
04000	Distinctions such as "global" vs. "free" vs. "public" vs. "private"
04100	relate not to variables but to identifiers in the program.  If in some
04200	block, "X" is declared global, then the global name-variable lexicon
04300	is used to determine which of the many variables whose names are "X"
04400	is designated (i.e., ("X" GLOBAL) is evaluated).  If "X" is declared "free" in a block (usually implicitly),
04500	then the name-variable lexicon of the environment is used; this might
04600	involve a search through the calling function, its caller, etc., until
04700	the variable is found.  "Free" identifiers allow correct handling of
04800	FUNARGS, coroutines, and cooperating processes.  However, they are
04900	somewhat inefficient and are not needed when these features are not
05000	used or when precautions against name conflicts are taken.  In such
05100	cases, "public" identifiers can be used, which (like SPECIAL variables
05200	in LISP 1.6) obey a push-down discipline and can be accessed without
05300	searching.  Finally, for utmost efficiency, "private" identifiers can
05400	be used, which are only accessed in the current function body and thus
05500	can be fetched by indexing in a SEQUENCE of variables (corresponding to the
05600	local portion of the regular push-down stack of LISP 1.6).  Other
05700	"access" methods for variables can be defined by the user.

00200	Functions are vitally important in LISP, and LISP70 considers a function
00300	to be an entity with an optional NAME (the "LABEL"), a FORMALS cell with
00400	a list specifying each bound identifier and whether it is
00500	free, public, or private, and whether passed quoted or evaluated,
00600	and a BODY cell containing the S-expression of the body
00700	of the function or a "code" entity containing a
00800	binary program.
01000	When a function is called, an entity of type "process" is created
01100	with cells for each new variable in its scope, for temporary
01200	results (their number can usually be computed by the compiler),
01300	and for a link to the calling process.  For those variables named
01400	by free identifiers, the lexicon of the environment is updated;
01500	for those that are named by public identifiers, the public push-
01600	down list of the identifiers are updated; for private identifiers,
01700	nothing need be done.
01900	A normal function exit releases the space used by the process and
02000	restores control to the calling process.  Coroutine and FUNARG linkages
02100	do not release the space.
02300	In addition to functions, LISP70 allows "rewrite rules", useful for
02400	parsing, memory models, theorem proving, and other sophisticated
02500	applications (and even simple ones).  A rewrite rule is an
02600	entity of type "rewrite", with a SOURCE cell and a PRODUCT cell each of which
02700	points to a "pattern" entity, and a FORMALS cell which lists the dummy variables of
02800	the rule.  The application of a rewrite creates a process just like a
02900	function call, but the binding system is stream-to-pattern instead
03000	of list-to-list.  Another difference is that while functions
03100	are always called by explicit mention, the rewrite to be applied in a
03200	situation can be discovered as the result of a pattern-matching
03300	search.  The search is driven by tables based on the SOURCE pattern in conjunction with
03400	finding routines provided by the system or by the user.  The rewrite
03500	system is sophisticated enough to allow the alternative definition
03600	of CONS, CAR, and CDR by the rules:
03700	.B
03800		(CAR (CONS :X :Y)) → :X
03900		(CDR (CONS :X :Y)) → :Y
04000	.E

00200	So far it has not been necessary to discuss how entities are
00300	created and how they get cells.  The function (MAKE TY) makes an
00400	entity of type TY and returns that entity as its value.  MAKE looks
00500	at a table associated with the type to determine how many and which
00600	cells to initiallize, and whether to use consecutive storage, hash
00700	tables, or some other allocation scheme.  For most entities, new
00800	cells can not be added after creation.  However, for some (like nodes),
00900	they can be added.  This is simply accomplished by using PLACE; if
01000	PLACE finds that a cell does not exist, it adds it if possible.
01200	The usual LISP functions SETQ, LIST, INTERN, READ, etc. are easily
01300	defined as macros or functions in LISP70.  Thus, LISP70 is trivially
01400	extended to emulate LISP 1.6 or BBN-LISP.  To emulate MLISP-2,
01500	PLANNER, or QA-4, backtracking must be added.  This is done by
01600	adding a CONTEXT cell and sometimes other cells to each variable and
01700	to certain other entities.  The details of implementation vary
01800	from language to language.  For users who do not require backtracking,
01900	these extra cells and the corresponding backtrack primitives need
02000	not take up space and time.
02200	LISP70 provides powerful yet simple and efficient low-level
02300	extendability to LISP70.  Extendability at higher levels is provided
02400	by LET statements (the context-sensitive parsing rules of MLISP-2),
02500	by macros, and by rewrite rules.  Furthermore, the
02600	code generators are highly parametrized so new data types and
02700	access methods are easily taught to it.  Finally, machine-dependent
02800	and machine-independent portions of the system are clearly separated
02900	so that conversion to other computers is a simple matter.