perm filename CLVALI.MSG[COM,LSP] blob sn#846320 filedate 1987-09-25 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00084 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00011 00002	Introduction
C00015 00003	∂23-Sep-84  1625	RPG  	Introduction  
C00019 00004	∂02-Oct-84  1318	RPG  	Chairman 
C00020 00005	∂05-Oct-84  2349	WHOLEY@CMU-CS-C.ARPA 	Chairman     
C00024 00006	∂13-Oct-84  1451	RPG  	Chairman 
C00025 00007	∂27-Oct-84  2159	RPG  	Hello folks   
C00028 00008	∂27-Oct-84  2202	RPG  	Correction    
C00029 00009	∂02-Nov-84  1141	brown@DEC-HUDSON 	First thoughts on validation    
C00038 00010	∂04-Nov-84  0748	FAHLMAN@CMU-CS-C.ARPA 	Second thoughts on validation   
C00047 00011	∂07-Nov-84  0852	brown@DEC-HUDSON 	test format 
C00085 00012	∂09-Nov-84  0246	RWK@SCRC-STONY-BROOK.ARPA 	Hello   
C00094 00013	∂12-Nov-84  1128	brown@DEC-HUDSON 	validation process    
C00098 00014	∂12-Nov-84  1237	FAHLMAN@CMU-CS-C.ARPA 	validation process    
C00102 00015	∂12-Nov-84  1947	fateman%ucbdali@Berkeley 	Re:  validation process 
C00104 00016	∂13-Nov-84  0434	brown@DEC-HUDSON 	Confidentially loses  
C00105 00017	∂18-Dec-85  1338	PACRAIG@USC-ISIB.ARPA 	Assistance please?    
C00106 00018	∂12-Mar-86  2357	cfry%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU 	Validation proposal 
C00118 00019	∂13-Mar-86  1015	berman@isi-vaxa.ARPA 	Re: Validation proposal
C00121 00020	∂13-Mar-86  1028	berman@isi-vaxa.ARPA 	Re: Validation proposal
C00124 00021	∂17-Mar-86  0946	berman@isi-vaxa.ARPA 	Re: Validation proposal
C00126 00022	∂19-Mar-86  1320	berman@isi-vaxa.ARPA 	Re: Validation Contributors 
C00128 00023	∂27-Mar-86  1332	berman@isi-vaxa.ARPA 	Validation Distribution Policy   
C00130 00024	∂29-Mar-86  0819	FAHLMAN@C.CS.CMU.EDU 	Validation Distribution Policy   
C00133 00025	∂16-Jun-86  1511	berman@isi-vaxa.ARPA 	Validation Suite  
C00135 00026	∂09-Jul-86  1213	berman@vaxa.isi.edu 	Validation Control 
C00138 00027	∂22-Jul-86  1344	berman@vaxa.isi.edu 	test control  
C00142 00028	∂23-Jul-86  2104	NGALL@G.BBN.COM 	Re: test control  
C00147 00029	∂24-Jul-86  0254	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
C00158 00030	∂24-Jul-86  1053	berman@vaxa.isi.edu 	Re: test control   
C00160 00031	∂24-Jul-86  1148	marick%turkey@gswd-vms.ARPA 	Re: test control
C00166 00032	∂24-Jul-86  1546	berman@vaxa.isi.edu 	    
C00173 00033	∂24-Jul-86  1549	berman@vaxa.isi.edu 	test control  
C00183 00034	∂24-Jul-86  1740	FAHLMAN@C.CS.CMU.EDU 	FSD
C00186 00035	∂25-Jul-86  0047	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
C00192 00036	∂25-Jul-86  1036	berman@vaxa.isi.edu 	Re: FSD  
C00195 00037	∂25-Jul-86  1051	berman@vaxa.isi.edu 	Re: test control   
C00199 00038	∂25-Jul-86  1111	FAHLMAN@C.CS.CMU.EDU 	FSD
C00201 00039	∂25-Jul-86  1127	berman@vaxa.isi.edu 	Re: FSD  
C00203 00040	∂25-Jul-86  1254	FAHLMAN@C.CS.CMU.EDU 	FSD
C00205 00041	∂25-Jul-86  1541	berman@vaxa.isi.edu 	Re: FSD  
C00206 00042	∂26-Jul-86  1447	marick%turkey@gswd-vms.ARPA 	Test suite 
C00209 00043	∂28-Jul-86  1122	berman@vaxa.isi.edu 	Re: Test suite
C00212 00044	∂29-Jul-86  1220	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: test control
C00221 00045	∂29-Jul-86  1629	berman@vaxa.isi.edu 	Add to list   
C00223 00046	∂31-Jul-86  0834	marick%turkey@gswd-vms.ARPA 	Lisp conference 
C00225 00047	∂31-Jul-86  1034	berman@vaxa.isi.edu 	Re: Lisp conference
C00227 00048	∂01-Aug-86  1348	berman@vaxa.isi.edu 	Conference    
C00229 00049	∂11-Aug-86  1122	berman@vaxa.isi.edu 	Thanks   
C00231 00050	∂13-Aug-86  1130	berman@vaxa.isi.edu 	Test Control  
C00233 00051	∂19-Aug-86  0039	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Test Control    
C00237 00052	∂19-Aug-86  1135	berman@vaxa.isi.edu 	Re: Test Control   
C00240 00053	∂20-Aug-86  0604	hpfclp!hpfcjrd!diamant@hplabs.HP.COM 	Re: Test Control 
C00243 00054	∂21-Aug-86  1352	berman@vaxa.isi.edu 	Purpose of Test Suite   
C00246 00055	∂21-Aug-86  1738	FAHLMAN@C.CS.CMU.EDU 	Purpose of Test Suite  
C00248 00056	∂22-Aug-86  0124	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Purpose of Test Suite
C00252 00057	∂22-Aug-86  0125	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: Test Control
C00258 00058	∂22-Aug-86  1054	berman@vaxa.isi.edu 	Re: Test Control   
C00261 00059	∂24-Aug-86  1940	marick%turkey@gswd-vms.ARPA 	Purpose of Test Suite
C00265 00060	∂25-Aug-86  1221	berman@vaxa.isi.edu 	TEST MACRO    
C00277 00061	∂25-Aug-86  1225	berman@vaxa.isi.edu 	Test-Macro examples
C00282 00062	∂25-Aug-86  1255	berman@vaxa.isi.edu 	Purpose  
C00287 00063	∂27-Aug-86  0041	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO 
C00294 00064	∂27-Aug-86  1211	berman@vaxa.isi.edu 	TEST MACRO - Fry's Comments  
C00302 00065	∂28-Aug-86  1308	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO - Fry's Comments    
C00305 00066	∂08-Sep-86  1408	berman@vaxa.isi.edu 	TEST MACRO    
C00319 00067	∂09-Sep-86  1504	berman@vaxa.isi.edu 	Correct Test Macro 
C00333 00068	∂12-Sep-86  1259	berman@vaxa.isi.edu 	Test Stuff    
C00335 00069	∂12-Sep-86  1431	franz!binky!layer@kim.Berkeley.EDU 	Re: Test Stuff     
C00337 00070	∂16-Sep-86  1425	berman@vaxa.isi.edu 	Running Tests 
C00340 00071	∂16-Sep-86  1821	FAHLMAN@C.CS.CMU.EDU 	Running Tests
C00342 00072	∂17-Sep-86  1044	berman@vaxa.isi.edu 	Re: Running Tests  
C00344 00073	∂17-Sep-86  1437	berman@vaxa.isi.edu 	Running Tests 
C00348 00074	∂19-Sep-86  1334	berman@vaxa.isi.edu 	Floating Point Suite    
C00352 00075	∂19-Sep-86  1901	RWK@YUKON.SCRC.Symbolics.COM 	Floating Point Suite
C00356 00076	∂22-Sep-86  0857	hpfclp!paul@hplabs.HP.COM 	Floating Point Testing 
C00359 00077	∂22-Sep-86  0927	hpfclp!paul@hplabs.HP.COM 	Floating Point Tests   
C00362 00078	∂22-Sep-86  1155	fateman@renoir.Berkeley.EDU 	Re:  Floating Point Tests 
C00365 00079	∂22-Sep-86  1330	@DESCARTES.THINK.COM:gls@AQUINAS.THINK.COM 	Re:  Floating Point Tests 
C00368 00080	∂23-Sep-86  1119	berman@vaxa.isi.edu 	Floating Point Tests    
C00373 00081	∂23-Sep-86  1308	fateman@renoir.Berkeley.EDU 	Re:  Floating Point Tests 
C00375 00082	∂23-Sep-86  1348	berman@vaxa.isi.edu 	Re:  Floating Point Tests    
C00380 00083	∂24-Sep-86  0155	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: Running Tests    
C00383 00084	∂24-Sep-86  1127	berman@vaxa.isi.edu 	Re: Running Tests, ERRSET    
C00385 ENDMK
C⊗;
Introduction
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:

		CL-Validation@su-ai.arpa

Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:

			   CLVALI.MSG[COM,LSP]

You can read this file or FTP it away without logging in to SAIL.

To communicate with the moderator, send to the address:

		CL-Validation-request@su-ai.arpa

Here is a list of the people who are currently on the mailing list:

Person			Affiliation	Net Address

Richard Greenblatt	LMI		"rg%oz"@mc
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.

Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.

∂23-Sep-84  1625	RPG  	Introduction  
To:   cl-validation@SU-AI.ARPA   
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:

		CL-Validation@su-ai.arpa

Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:

			   CLVALI.MSG[COM,LSP]

You can read this file or FTP it away without logging in to SAIL.

To communicate with the moderator, send to the address:

		CL-Validation-request@su-ai.arpa

Here is a list of the people who are currently on the mailing list:

Person			Affiliation	Net Address

Richard Greenblatt	LMI		"rg%oz"@mc
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.

Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.

∂02-Oct-84  1318	RPG  	Chairman 
To:   cl-validation@SU-AI.ARPA   
Now that we've basically got most everyone who is interested on the mailing
list, let's pick a chairman. I suggest that people volunteer for chairman.

The duties are to keep the discussion going, to gather proposals and review
them, and to otherwise administer the needs of the mailing list. I will
retain the duties of maintaining the list itself and the archives, but
otherwise the chairman will be running the show. 

Any takers?
			-rpg-

∂05-Oct-84  2349	WHOLEY@CMU-CS-C.ARPA 	Chairman     
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 5 Oct 84  23:49:33 PDT
Received: ID <WHOLEY@CMU-CS-C.ARPA>; Sat 6 Oct 84 02:49:51-EDT
Date: Sat, 6 Oct 1984  02:49 EDT
Message-ID: <WHOLEY.12053193572.BABYL@CMU-CS-C.ARPA>
Sender: WHOLEY@CMU-CS-C.ARPA
From: Skef Wholey <Wholey@CMU-CS-C.ARPA>
To:   Cl-Validation@SU-AI.ARPA
CC:   Dick Gabriel <RPG@SU-AI.ARPA>
Subject: Chairman 

I'd be willing to chair this mailing list.

I've been very much involved in most aspects of the implementation of Spice
Lisp, from the microcode to the compiler and other parts of the system, like
the stream system, pretty printer, and Defstruct.  A goal of ours is that Spice
Lisp port easily, so most of the system is written in Common Lisp.

Since our code is now being incorporated into many implementations, it's
crucial that it correctly implement Common Lisp.  A problem with our code is
that some of it has existed since before the idea of Common Lisp, and we've
spent many man-months tracking the changes to the Common Lisp specification as
the language evolved.  I am sure we've got bugs because I'm sure we've missed
"little" changes between editions of the manual.

So, I'm interested first in developing code that will aid implementors in
discovering pieces of the manual they may have accidentally missed, and second
in verifying that implementation X is "true Common Lisp."  I expect that the
body of code used for the first purpose will evolve into a real validation
suite as implementors worry about smaller and smaller details.

I've written little validation suites for a few things, and interested parties
can grab those from <Wholey.Slisp> on CMU-CS-C.  Here's what I have right now:

	Valid-Var.Slisp		Checks to see that all variables and constants
				in the CLM are there, and satisfy simple tests
				about what their values should be.

	Valid-Char.Slisp	Exercises the functions in the Characters
				chapter of the CLM.

	Valid-Symbol.Slisp	Exercises the functions in the Symbols chapter
				of the CLM.

Some of the tests in the files may seem silly, but they've uncovered a few bugs
in both Spice Lisp and the Symbolics CLCP.

I think more programs that check things out a chapter (or section) at a time
would be quite valuable, and I'm willing to devote some time to coordinating
such programs into a coherent library.

--Skef

∂13-Oct-84  1451	RPG  	Chairman 
To:   cl-validation@SU-AI.ARPA   

Gary Brown of DEC, Ellen Waldrum of TI, and Skef Wholey of CMU
have volunteered to be chairman of the Validation subgroup. Perhaps
these three people could decide amongst themselves who should be
chairman and let me know by October 24.

			-rpg-

∂27-Oct-84  2159	RPG  	Hello folks   
To:   cl-validation@SU-AI.ARPA   

We now have a chairman of the charter:  Bob Kerns of Symbolics.  I think
he will make an excellent chairman.  For your information I am including
the current members of the mailing list.

I will now let Bob take over responsibility for the discussion.

Dave Matthews		HP		"hpfclp!validation%hplabs"@csnet-relay
Ken Sinclair 		LMI		"khs%mit-oz"@mit-mc
Gary Brown		DEC		Brown@dec-hudson
Ellen Waldrum		TI		WALDRUM.ti-csl@csnet-relay
Skef Wholey		CMU		Wholey@cmuc
John Foderaro		Berkeley	jkf@ucbmike.arpa
Cordell Green		Kestrel		Green@Kestrel
Richard Greenblatt	LMI		"rg%oz"@mc
Richard Fateman		Berekely	fateman@berkeley
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-Validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

∂27-Oct-84  2202	RPG  	Correction    
To:   cl-validation@SU-AI.ARPA   

The last message about Bob Kerns had a typo in it. He is chairman
of the validation subgroup, not the charter subgroup. Now you
know my secret abot sending out these announcements!
			-rpg-

∂02-Nov-84  1141	brown@DEC-HUDSON 	First thoughts on validation    
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 2 Nov 84  11:38:53 PST
Date: Fri, 02 Nov 84 14:34:24 EST
From: brown@DEC-HUDSON
Subject: First thoughts on validation
To: cl-validation@su-ai
Cc: brown@dec-hudson

I am Gary Brown and supervise the Lisp Development group at Digital
I haven't seen any mail about validation yet, so this is to get things
started.

I think there are three areas we need to address:

 1) The philosophy of validation - What are we going to validate and
    what are we explicitly not going to check?

 2) The validation process - What kind of mechanism should be used to
    implement the validation suite, to maintain it, to update it and
    actually validate Common Lisp implementations?

 3) Creation of an initial validation suite - I believe we could disband
    after reporting on the first two areas, but it would be fun if we
    could also create a prototype validation suite.  Plus, we probably
    can't do a good job specifying the process if we haven't experimented.

Here are my initial thoughts about these three areas:

PHILOSOPHY
We need to clearly state what the validation process is meant to 
accomplish and what it is not intended to accomplish.  There are
aspects of a system of interest to users which we cannot validate.
For example, language validation should not be concerned with:
 - The performance/efficiency of the system under test.  There should
   no timing tests built into the validation suite.
 - The robustness of the system.  How it responds to errors, the
   usefulness of error messages should not be a consideration
   in the design of tests.
 - The type of support tools such as debuggers and editors should
   not be tested or reported on.
In general, the validation process should report only on  whether or
not the implementation is a legal common lisp as defined by the
common lisp reference manual.   Any other information derived from
the testing process should not be made public.  The testing process
cannot produce information which can be used by vendors as advertisements
for their implementations or to degrade other implementations.

We need to state how we will test language elements which are ill-defined
in the reference manual.  For example, if the manual states that it
is "an error" to do something, then we cannot write a test for that
situation.  However, if the manual states that an "error is signaled"
then we should verify that. 

There are several functions in the language whose action is implementation
dependent.  I don't see how we can write a test for INSPECT or for
the printed appearance when *PRINT-PRETTY* is on (however, we can
insure that what is printed is still READable).

PROCESS
We need to describe a process  for language validation.  We could
have a very informal process where the test programs are publicly
available and  potential customers acquire and run the tests.  However, 
I think we need, at least initially, a more formal process.

A contract should be written (with ARPA money?) to some third party
software house to produce and maintain the validation programs, to
execute the tests, and to report the results.  I believe the ADA
validation process works something like this:
 - Every six months a "field test" version of the validation suite
   suite is produced (and the previous field test version is made the
   official version).  Interested parties can acquire the programs
   run them and comment back to SofTech.
 - When a implementation wants to validate, it tells some government
   agency, gets the current validation suite, runs it and send all
   the output back.
 - An appointment is then set up and people the validation agency
   come vendor and run all the tests themselves, again bundle up
   the output and take it away.
 - Several weeks later, the success of the testing is announced.

This seems like a reasonable process to me.  We might want to modify
it by:
 - Having the same agency that produced the tests, validate their results.
 - Getting rid of the on site visit requirement;  it's expensive I
   think the vendor needs to include a check for $10,000 to when
   they request validation.  That might be hard for universities
   to justify.

Some other things I think need to set up are:
 - A good channel from the test producers to the language definers 
   for quick clarifications and to improve the manual
 - Formal ways to complain about the contents of test
 - Ways for new tests to be suggested.  Customers are sure to
   find bugs in validated systems, so it would be invaluable if
   they could report these as holes in the test system.

A FIRST CUT
To do a good job defining the validation process, I think we need to
try to produce a prototype test system.  At Digital we have already
expended considerable effort writing tests for VAX LISP and I assume that
everyone else implementing Common Lisp done the same.  Currently, our 
test software is considered proprietary information.  However, I believe
that we would be willing to make it public domain if the other vendors
were willing to do the same. 

If some kind of informal agreement can be made, we should try to specify
the form of the tests, have everyone convert their applicable tests
to this form and then exchange tests.  This will surely generate
a lot of information on how the test system should be put together.

-Gary Brown

∂04-Nov-84  0748	FAHLMAN@CMU-CS-C.ARPA 	Second thoughts on validation   
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 4 Nov 84  07:47:00 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Sun 4 Nov 84 10:47:06-EST
Date: Sun, 4 Nov 1984  10:47 EST
Message-ID: <FAHLMAN.12060893556.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To:   cl-validation@SU-AI.ARPA
Subject: Second thoughts on validation


I agree with all of Gary Brown's comments on the proper scope of
validation.  The only point that may cause difficulty is the business
about verifying that an error is signalled in all the places where this
is specified.  The problem there is that until the Error subgroup does
its thing, we have no portable way to define a Catch-All-Errors handler
so that the valiadtion program can intercept such signals and proceed.
Maybe we had better define such a hook right away and require that any
implementation that wants to be validated has to support this, in
addition to whatever more elegant hierarchical system eventually gets
set up.  The lack of such a unversal ERRSET mechanism is clearly a
design flaw in the language.  We kept putting this off until we could
figure out what the ultimate error handler would look like, and so far
we haven't done that.

As for the process, I think that the validation suite is naturally going
to be structured as a series of files, each of which contains a function
that will test some particular part of the language: a chapter's worth
or maybe just some piece of a chapter such as lambda-list functionality.
That way, people can write little chunks of validation without being
overwhelmend by the total task.  Each such file should have a single
entry point to a master function that runs everything else in the file.
These things should print out an informative message whenever it notices
an implementation error.  They can also print out some other commentary
at the implementor's discretion, but probably there should be a switch
that will muzzle anything other than hard errors.  Finally, there should
be some global switch that starts out as NIL and gets set to T whenever
some module finds a clear error.  If this is still NIL after every
module has done its testing, the implementation is believed to be
correct.  I was going to suggest a counter for this, but then we might
get some sales rep saying that Lisp X has 14 validation errors and our
Lisp only has 8.  That would be bad, since some errors are MUCH more
important than others.

To get the ball rolling, we could begin collecting public-domain
validation modules in some place that is easily accessible by arpanet.
As these appear, we can informally test various implementations against
them to smoke out any inconsistencies or disagreements about the tests.
I would expect that when this starts, we'll suddenly find that we have a
lot of little questions to answer about the language itself, and we'll
have to do our best to resolve those questions quickly.  Once we have
reached a consensus that a test module is correct, we can add it to some
sort of "approved" list, but we should recognize that, initially at
least, the testing module is as likely to be incorrect as the
implementation.

As soon as possible, this process of maintaining and distributing the
validation suite (and filling in any holes that the user community does
not fill voluntarily) should fall to someone with a DARPA contract to do
this.  No formal testing should begin until this organization is in
place and until trademark protection has been obtained for "DARPA
Validated Common Lisp" or whatever we are going to call it.  But a lot
can be done informally in the meantime.

I don't see a lot of need for expensive site visits to do the
validating.  It certainly doesn't have to be a one-shot win-or-lose
process, but can be iterative until all the tests are passed by the same
system, or until the manufacturer decides that it has come as close as
it is going to for the time being.  Some trusted (by DARPA), neutral
outside observer needs to verify that the hardware/software system in
question does in fact run the test without any chicanery, but there are
all sorts of ways of setting that up with minimal bureaucratic hassle.
We should probably not be in the business of officially validating
Common Lisps on machines that are still under wraps and are not actually
for sale, but the manufacturers (or potential big customers) could
certainly run the tests for themselves on top-secret prototypes and be
ready for official validation as soon as the machine is released to the
public.

I'm not sure how to break the deadlock in which no manufacturer wants to
be the first to throw his proprietary validation software into the pot.
Maybe this won't be a problem, if one of the less bureaucratic companies
just decides to take the initiative here.  But if there is such a
deadlock, I suppose the way to proceed is first to get a list of what
each company proposes to offer, then to Get agreement from each that it
will donate its code if the others do likewise, then to get some lawyer
(sigh!) to draw up an agreement that all this software will be placed in
the public domain on a certain date if all the other companies have
signed the agreement by that date.  It would be really nice to avoid
this process, however.  I see no advantage at all for a company to have
its own internal validation code, since until that code ahs been
publically scrutinized, there is no guarantee that it would be viewed as
correct by anyone else or that it will match the ultimate standard.

-- Scott

∂07-Nov-84  0852	brown@DEC-HUDSON 	test format 
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 7 Nov 84  08:43:57 PST
Date: Wed, 07 Nov 84 11:40:37 EST
From: brown@DEC-HUDSON
Subject: test format
To: cl-validation@su-ai

First, I would hope that submission of test software will not require
any lawyers.  I view this as a one-time thing, the only purpose of which
is to get some preliminary test software available to all implementations,
and to give this committee some real data on language validation.
The creation and maintenance of the real validation software should be
the business of the third party funded to do this.  I would hope that
they can use what we produce, but that should not be a requirement.

If we are going to generate some preliminary tests, we should develop
a standard format for the tests.   I have attached a condensed and
reorganizied version of the "developers guide" for our test system.
Although I don't think our test system is particularly elegant, it
basically works.  There are a few things I might change someday:

  - The concept of test ATTRIBUTES is not particularly useful.  We
    have never run tests by their attributes but always run a whole
    file full of them.  

  - The expected result is not evaluated (under the assumption that
    if it were, most of the time you would end up quoting it.  That
    is sometimes cumbersome.

  - There is not a builtin way to check multiple value return.  You
    make the test-case do a multiple-value-list and look at the list.
    That is sometimes cumbersome and relatively easy to fix.

  - We haven't automated the analysis of the test results.

  - Our test system is designed to handle lot of little tests and I
    think that it doesn't simplify writing complex tests.  I have
    never really thought about what kind of tools would be useful.

If we want to try to build some tests, I am willing to change our test
system to incorporate any good ideas and make it available.

-Gary


!

     1  A SAMPLE TEST DEFINITION

          Here is the test for GET.

     (def-lisp-test (get-test :attributes (symbols get)
                              :locals (clyde foo))
       "A test of get.  Uses the examples in the text."
       ((fboundp 'get) ==> T)
       ((special-form-p 'get) ==> NIL)
       ((macro-function 'get) ==> NIL)
       ((progn
           (setf (symbol-plist 'foo) '(bar t baz 3 hunoz "Huh?"))
           (get 'foo 'bar))
         ==> T)
       ((get 'foo 'baz) ==> 3)
       ((get 'foo 'hunoz) ==> "Huh?")
       ((prog1
           (get 'foo 'fiddle-sticks)
           (setf (symbol-plist 'foo) NIL))
         ==> NIL)
       ((get 'clyde 'species) ==> NIL)
       ((setf (get 'clyde 'species) 'elephant) ==> elephant)
       ((get 'clyde) <error>)
       ((prog1
           (get 'clyde 'species)
           (remprop 'clyde 'species))
         ==> elephant)
       ((get) <error>)
       ((get 2) <error>)
       ((get 4.0 'f) <error>))
     Notice that everything added to the property list is taken off  again,
     so  that  the  test's  second run will also work.  Notice also that it
     isn't wise to start by testing for

             ((get 'foo 'baz)  ==> NIL)

     as someone may have decided to give FOO the property  BAZ  already  in
     another test.



     2  DEFINING LISP TESTS

          Tests are defined with the DEF-LISP-TEST macro.

     DEF-LISP-TEST {name | (name &KEY :ATTRIBUTES :LOCALS)}           [macro]
                   [doc-string] test-cases







                                   - 1 -
!
                                                                Page 2


     3  ARGUMENTS TO DEF-LISP-TEST

     3.1  Name

          NAME is the name of the  test.   Please  use  the  convention  of
     calling  a  test FUNCTION-TEST, where FUNCTION is the name of (one of)
     the function(s) or variable(s) tested by that test.  The  symbol  name
     will  have  the  expanded test code as its function definition and the
     following properties:

           o  TEST-ATTRIBUTES - A list of all the attribute  symbols  which
              have this test on their TEST-LIST property.

           o  TEST-DEFINITION -  The  expanded  test  code.   Normally  the
              function  value  of  the  test is compiled; the value of this
              property is EVALed to run the test interpreted.

           o  TEST-LIST - The list of tests  with  NAME  as  an  attribute.
              This list will contain at least NAME.




     3.2  Attributes

          The value of :ATTRIBUTES is a list of  "test  attributes".   NAME
     will  be  added to this list.  Each symbol on this list will have NAME
     added to the list which is the value of its TEST-LIST property.



     3.3  Locals

          Local variables can be specified  and  bound  within  a  test  by
     specifying the :LOCALS keyword followed by a list of the for used in a
     let var-list.  For example, specifying the list (a b c)  causes  a,  b
     and c each to be bound to NIL during the run of the test; the list ((a
     1) (b 2) (c 3)) causes a to be bound to 1, b to 2, and c to  3  during
     the test.



     3.4  Documentation String

          DOC-STRING is a normal documentation string of documentation type
     TESTS.   To  see  the documentation string of a function FOO-TEST, use
     (DOCUMENTATION 'FOO-TEST 'TESTS).   The  documentation  string  should
     include  the  names of all the functions and variables to be tested in
     that test.  Mention if there is anything missing from the  test,  e.g.
     tests of the text's examples.




                                   - 2 -
!
                                                                Page 3


     3.5  Test Cases

          TEST-CASES (the remainder of the body) is a series of test cases.
     Each  test  case  is  a  list of a number of elements as follows.  The
     order specified here must hold.



     3.5.1  Test Body -

          A form to be executed as the test body.  If it  returns  multiple
     values, only the first will be used.



     3.5.2  Failure Option -

          The symbol <FAILURE> can be used to indicate that the  test  case
     is  known  to  cause  an  irrecoverable  error  (e.g.  it goes into an
     infinite loop).  When the test case is run, the code is not  executed,
     but  a  message  is  printed  to  remind you to fix the problem.  This
     should be followed by normal result options.  Omission of this  option
     allows the test case to be run normally.



     3.5.3  Result Options -



     3.5.3.1  Comparison Function And Expected Result -

          The Test Body will be compared with the Expected Result using the
     function EQUAL if you use
             ==> expected-result
     or with the function you specify if you use
             =F=> function expected-result
     There MUST be white-space after ==> and =F=>, as they are  treated  as
     symbols.   Notice  that neither function nor expected-result should be
     quoted.  "Function" must be defined; an explicit lambda form is legal.
     "Expected-Result"  is the result you expect in evaluating "test-body".
     It is not evaluated.  The comparison function will be called  in  this
     format:
             (function test-body 'expected-value)



     3.5.3.2  Errors -

          <ERROR> - The test is expected to signal  an  error.   This  will
     normally  be  used  with  tests which are expected to generate errors.
     This is an alternative  to  the  comparison  functions  listed  above.
     There should not be anything after the symbol <ERROR>.  It checks that

                                   - 3 -
!
                                                                Page 4


     an error is signaled when the test case is run interpreted,  and  that
     an  error  is  signaled  either  during the compilation of the case or
     while the case is being evaluated when the test is run compiled.



     3.5.3.3  Throws -

          =T=> - throw-tag result - The test is expected to  throw  to  the
     specified  tag  and  return  something  EQUAL to the specified result.
     This clause is only required for a small number of tests.  There  must
     be  a  space  after  =T=>,  as  it is treated as a symbol.  This is an
     alternative to the functions given above.  This does not work compiled
     at the moment, due to a compiler bug.



     4  RUNNING LISP TESTS

          The function RUN-TESTS can be called with no arguments to run all
     the  tests,  with  a  symbol which is a test name to run an individual
     test, or with a list of symbols, each of which is an attribute, to run
     all  tests  which have that attribute.  Remember that the test name is
     always added to the attribute list automatically.

          The special variable *SUCCESS-REPORTS* controls whether  anything
     will be printed for successful test runs.  The default value is NIL.

          The special variable *START-REPORTS* controls whether  a  message
     containing  the  test  name  will be printed at the start of each test
     execution.  The default value is NIL.  If *SUCCESS-REPORTS* is T, this
     variable is treated as T also.

          The special variable *RUN-COMPILED-TESTS*  controls  whether  the
     "compiled"  versions  of the specified tests will be run.  The default
     value is T.

          The special variable *RUN-INTERPRETED-TESTS* controls whether the
     "interpreted"  versions  of  the  specified  tests  will  be run.  The
     default value is T.

          The special  variable  *INTERACTIVE*  controls  whether  you  are
     prompted  after  unexpected errors for whether you would like to enter
     debug.   It  uses  yes-or-no-p.   To  continue  running  tests   after
     enterring  debug  after  one  of  these  prompts,  type  CONTINUE.  If
     *INTERACTIVE* is set to T, the test system  will  do  this  prompting.
     The default value is NIL.



     5  GUIDE LINES FOR WRITING TEST CASES

          1.  The first several test cases in each test should be tests for

                                   - 4 -
!
                                                                Page 5


     the  existence  and correct type of each of the functions/variables to
     be    tested    in    that    test.     A    variable,     such     as
     *DEFAULT-PATHNAME-DEFAULTS*, should have tests like these:

             ((boundp '*default-pathname-defaults*) ==> T)
             ((pathnamep *default-pathname-defaults*) ==> T)


          A function, such as OPEN, should have these tests:

             ((fboundp 'open) ==> T)
             ((macro-function 'open) ==> NIL)
             ((special-form-p 'open) ==> NIL)


          A macro, such as WITH-OPEN-FILE, should have these tests:

             ((fboundp 'with-open-file) ==> T)
             ((not (null (macro-function 'with-open-file))) T)

     Note that, as MACRO-FUNCTION returns the function definition (if it is
     a  macro)  or  NIL  (if  it  isn't  a  macro),  we  use NOT of NULL of
     MACRO-FUNCTION here.  Note also that a macro may  also  be  a  special
     form,  so  SPECIAL-FORM-P  is not used:  we don't care what the result
     is.

          A special form, such as SETQ, should have these tests:

             ((fboundp 'setq) ==> T)
             ((not (null (special-form-p 'setq))) T)

     Again, note that SPECIAL-FORM-P returns the function definition (if it
     is  a  special  form)  or  NIL (if it isn't), so we use NOT of NULL of
     SPECIAL-FORM-P here.  Note also that we don't care  if  special  forms
     are also macros, so MACRO-FUNCTION is not used.



          2.  The next tests  should  be  simple  tests  of  each  of  your
     functions.   If  you  start  right  in  with complicated tests, it can
     become difficult to unravel simple bugs.  If possible, create one-line
     tests which only call one of the functions to be tested.

          E.g.  for +:

             ((+ 2 10) ==> 12)




          3.  Test each of the examples given in the Common Lisp Manual.



                                   - 5 -
!
                                                                Page 6


          4.  Then test more complicated cases.  Be sure to test both  with
     and  without each of the optional arguments and keyword arguments.  Be
     sure to test what the manual SAYS, not what you know that we do.



          5.  Then test for obvious cases which  should  signal  an  error.
     Obvious  things  to test are that it signals an error if there are too
     few or too many arguments, or if the argument is of  the  wrong  type.
     E.g.  for +

             ((+ 2 'a) <ERROR>)




     6  HINTS

          Don't try to be  clever.   What  we  need  first  is  a  test  of
     everything.   If  we decide that we need "smarter" tests later, we can
     go back and embellish.  Right now we need to have a  test  that  shows
     whether the functions and variables we are supposed to have are there,
     and that tells whether  at  first  glance  the  function  is  behaving
     properly.  Even with simple tests this test system will be huge.

          Don't write long test cases if you can help it.  Think about  the
     kind  of error messages you might get and how easy it will be to debug
     them.

          Remember that, although the test system guarantees that the  test
     cases  within  one  test are run in the order defined, no guarantee is
     made that your tests will be run  in  the  order  in  which  they  are
     loaded.   Do  not  write  tests which depend on other tests having run
     before them.

          It is now possible to check for cases which should signal errors;
     please do.

          I have found it easiest to compose and  then  debug  tests  which
     have no more than 20 cases.  Once a test works I often add a number of
     cases, however, and I do have some  with  over  100  cases.   However,
     sometimes  tests  with as few as 10 cases can be difficult to unravel,
     if, for example, the test won't compile properly.  Therefore, if there
     is  a  group  of related functions which require many tests each, I am
     more likely to have a separate test for each function.  If testing one
     function   is   made   more   easy   by  also  testing  another  (e.g.
     define-logical-name, translate-logical-name and  delete-logical-name),
     it  can  be advantageous to test them together.  It is not a good idea
     to make the test cases or returned values very large, however.   Also,
     when many functions are tested in the same test, it is likely that the
     tests can get complicated to debug and/or that some aspect of  one  of
     the  functions  tested  could be forgotten.  Therefore, I would prefer
     that you NOT write, say, four or five tests, each of which is supposed

                                   - 6 -
!
                                                                Page 7


     to  test  all  of  the  functions  in one part of the manual.  I would
     prefer that a function have a test which is dedicated to it  (even  if
     it  is  shared with one or two other functions).  This means that some
     functions will be used not just in tests of themselves,  but  also  in
     tests of related functions; but that is ok.

          Remember that each test will be run twice by the test system.  So
     if your test changes something, change it back.



     7  EXAMPLES

     7.1  Comparison Function

          If you use the "( code =F=> comparison-function result )" format,
     the result is now determined by doing (comparison-function code (quote
     result)).

             (2 =F=> < 4)   <=>   (< 2 4)
             (2 =F=> > 4)   <=>   (> 2 4)

     Notice that the new comparison function you introduce is unquoted.

          You may also use an explicit lambda form.  For example,

             (2 =F=> (lambda (x y) (< x y)) 4)   <=>  (< 2 4)




     7.2  Expected Result

          Remember  that  the  returned  value  for  a  test  case  is  not
     evaluated;  so  "==>  elephant" means is it EQUAL to (quote elephant),
     not to the value of elephant.

          Consequently, this is in error:

             (mapcar #'1+ (list 0 1 2 3)) ==> (list 1 2 3 4))

     and this is correct:

             (mapcar #'1+ (list 0 1 2 3)) ==> (1 2 3 4))


                          *Tests Return Single Values*

             A test returns exactly one value; a test of a function
             which returns multiple values must be written as:

                     (MULTIPLE-VALUE-LIST form)


                                   - 7 -
!
                                                                Page 8


                             *Testing Side Effects*

             A test of a side effecting function must  verify  that
             the  function  both  returns  the  correct  value  and
             correctly causes the side effect.  The following  form
             is an example of a body that does this:

                 ((LET (FOO) (LIST (SETF FOO '(A B C)) FOO)))
                     ==> ((A B C) (A B C)))





     7.3  Throw Tags

          The throw tag is also not evaluated.

          You must have either "==> <result>" or "=F=>  comparison-function
     <result>" or "=T=> throw-tag <result>" or "<ERROR>" in each test case.
     Remember that you may no longer use <-T- or <-S-.  For  example,  this
     would be correct:

             ((catch 'samson
                  (throw 'delilah 'scissors))
               =T=> delilah scissors)

     This test case would cause an unexpected error:

             ((catch 'samson
                     (throw 'delilah 'scissors))
               ==> scissors)




     7.4  Expected Failures

          Any test case can have the <FAILURE> option inserted to  indicate
     that  the  code  should not be run.  For example, these test cases are
     innocuous:
             ((dotimes (count 15 7)
                  (setf count (1- count)))
               <failure> ==> 7)

             ((dotimes (count 15 7)
                  (setf count (1- count)))
               <failure> =F=> <= 7)

             ((throw 'samson (dotimes (count 15 7)
                                 (setf count (1- count))))
               <failure> =T=> samson 7)


                                   - 8 -
!
                                                                Page 9


             ((car (dotimes (count 15 7)
                       (setf count (1- count))))
               <failure> <error>)
     Obviously, you are not expected to introduce infinite loops  into  the
     test cases deliberately.



     7.5  Sample Error And Success Reports

          A test with cases which all succeed will run with  no  output  if
     *SUCCESS-REPORTS*  is  NIL;  if  it is set to T, output will look like
     this:
     ************************************************************************ 
     Starting: GET-TEST 
     A test of get.  Uses the examples in the text.

     TESTS:GET-TEST succeeded in compiled cases
      1 2 3 4 5 6 7 8 9 10 11 12 13 14

     TESTS:GET-TEST succeeded in interpreted cases
      1 2 3 4 5 6 7 8 9 10 11 12 13 14


          If a test case evaluates properly but returns the wrong value, an
     error   report   will   be   made   irrespective  of  the  setting  of
     *SUCCESS-REPORTS*.  The  reports  include  the  test  case  code,  the
     expected  result, the comparison function used, and the actual result.
     For example, if you run this test:

             (def-lisp-test (+-test :attributes (numbers +))
               ((+) ==> 0)
               ((+ 2 3) ==> 4)
               ((+ -4 -5) =F=> >= 0))

     The second and third cases are wrong, so there  will  be  bug  reports
     like this:
     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
     TESTS:+-TEST 
     Error in compiled case 2.

     Expected: (+ 2 3)

     to be EQUAL to: 4

     Received: 5
     -----------------------------------------------------------------------

     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
     TESTS:+-TEST
     Error in compiled case 3.

     Expected: (+ -4 -5)

                                   - 9 -
!
                                                               Page 10


     to be >= to: 0

     Received: -9
     ------------------------------------------------------------------------

          Unexpected errors cause a report which includes  the  code  which
     caused  the  error,  the expected result, the error condition, and the
     error message from the error system.  As with other errors, these bugs
     are  reported  regardless  of  the  setting of *SUCCESS-REPORTS*.  For
     example:

             (def-lisp-test (=-test :attributes (numbers =))
               ((fboundp '=) ==> T)
               ((macro-function '=) ==> NIL)
               ((special-form-p '=) ==> NIL))

     The following report is given if MACRO-FUNCTION is undefined:

     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←← 
     TESTS:=-TEST compiled case 2 caused an unexpected 
      correctable error in function *EVAL. 

     Expected: (MACRO-FUNCTION '=) 

     to be EQUAL to: NIL 

      The error message is: 
     Undefined function: MACRO-FUNCTION.

     -----------------------------------------------------------------------




     8  RUNNING INDIVIDUAL TEST CASES

          The interpreted version of a test case can be  run  individually.
     Remember that if any variables are used which are modified in previous
     test cases, the results will not be "correct"; for example, any  local
     variables bound for the test with the :LOCALS keyword are not bound if
     a test case is run with this function.  The format is
        (RUN-TEST-CASE test-name test-case)
     Test-name is a symbol; test-case is an integer.



     9  PRINTING TEST CASES

          There are some new functions:

        (PPRINT-TEST-DEFINITION name)
        (PPRINT-TEST-CASE name case-number)
        (PPRINT-ENTIRE-TEST-CASE name case-number)

                                  - 10 -
!
                                                               Page 11


        (PPRINT-EXPECTED-RESULT name case-number)


          In each case, name is a  symbol.   In  the  latter  three  cases,
     case-number is a positive integer.

          PPRINT-TEST-DEFINITION pretty prints the expanded test code for a
     test.

          PPRINT-TEST-CASE pretty prints the test code for the  body  of  a
     test case; i.e.  the s-expression on the left of the arrow.

          PPRINT-ENTIRE-TEST-CASE pretty prints the  entire  expanded  test
     code   for   the  case  in  question,  i.e.   rather  more  than  does
     PPRINT-TEST-CASE and rather less than PPRINT-TEST.

          PPRINT-EXPECTED-RESULT pretty prints the expected result for  the
     test case specified.  This cannot be done for a case which is expected
     to signal an error, as in that case there is no comparison of expected
     and actual result.


































                                  - 11 -

∂09-Nov-84  0246	RWK@SCRC-STONY-BROOK.ARPA 	Hello   
Received: from SCRC-STONY-BROOK.ARPA by SU-AI.ARPA with TCP; 9 Nov 84  02:46:18 PST
Received: from SCRC-HUDSON by SCRC-STONY-BROOK via CHAOS with CHAOS-MAIL id 123755; Thu 8-Nov-84 21:32:33-EST
Date: Thu, 8 Nov 84 21:33 EST
From: "Robert W. Kerns" <RWK@SCRC-STONY-BROOK.ARPA>
Subject: Hello
To: cl-validation@SU-AI.ARPA
Message-ID: <841108213326.0.RWK@HUDSON.SCRC.Symbolics.COM>

Hello.  Welcome to the Common Lisp Validation committee.  Let me
introduce myself, general terms, first.

I am currently the manager of Lisp System Software at Symbolics,
giving me responsibility for overseeing our Common Lisp effort,
among other things.  Before I became a manager, I was a developer
at Symbolics.  In the past I've worked on Macsyma, MacLisp and NIL
at MIT, and I've worked on object-oriented systems on these systems.

At Symbolics, we are currently preparing our initial Common Lisp
offering for release.  Symbolics has been a strong supporter of Common
Lisp in its formative years, and I strongly believe that needs to
continue.  Why do I mention this?  Because I think one form of support
is to contribute our validation tests as we collect and organize them.

I urge other companies to do likewise.  I believe we all have
far more to gain than to lose.  I believe there will be far more
validation code available in the aggregate than any one company
will have available by itself.  In addition, validation tests from
other places have the advantage of bringing a fresh perspective
to your testing.  It is all too easy to test for the things you
know you made work, and far too difficult to test for the more
obscure cases.

As chairman, I see my job as twofold:

1)  Facilitate communication, cooperation, and decisions.
2)  Facilitate the implementation of decisions of the group.

Here's an agenda I've put together of things I think we
need to discuss.  What items am I missing?  This nothing
more than my own personal agenda to start people thinking.

First, the development issues:

1)  Identify what tests are available.  So far, I know of
the contribution by Skef Wholey.  I imagine there will be
others forthcoming once people get a chance to get them
organized.  (Myself included).

2)  Identify a central location to keep the files.  We
need someone on the Arpanet to volunteer some space for
files of tests, written proposals, etc.  Symbolics is
not on the main Arpanet currently, so we aren't a good
choice.  Volunteers?

    Is there anyone who cannot get to files stored on
the Arpanet?  If so, please contact me, and I'll arrange
to get files to you via some other medium.

3)  We need to consider the review process for proposed
tests.  How do we get tests reviewed by other contributors?
We can do it by FTPing the files to the central repository
and broadcasting a request to evaluating it to the list.
Would people prefer some less public form of initial evaluation?

4)  Test implementation tools.  We have one message from Gary Brown
describing his tool.  I have a tool written using flavors that I
hope to de-flavorize and propose.  I think we would do well to standardize
in this area as much as possible.

5)  Testing techniques.  Again, Gary Brown has made a number of excellent
suggestions here.  I'm sure we'll all be developing experience that we
can share.

6)  What areas do we need more tests on?

And there are a number of political, proceedural, and policy issues that
need to be resolved.

7)  Trademark/copyright issues.  At Monterey, DARPA voluntered to
investigate trademarking and copyrighting the validation suite.
RPG: have you heard anything on this?

8)  How do we handle disagreements about the language?  This was
discussed at the Monterey meeting, and I believe the answer, if
we can't work it out, we ask the Common Lisp mailing list, and
especially the Gang of Five, for a clarification.  At any rate,
I don't believe it is in our charter to resolve language issues.
I expect we will IDENTIFY a lot of issues, however.

I don't think the rest of these need to be decided any time soon.
We can discuss them now, or we can wait.

9)  How does a company (or University) get a Common Lisp implementation
validated, and what does it mean?  We can discuss this now, but I
don't think we have to decide it until we produce our first validation
suite.

10) How do we distribute the validation suites?  I hope we can do most
of this via the network.  I am willing to handle distributing it to
people off the network until it gets too expensive in time or tapes.
We will need a longer-term solution to this, however.

11) Longer term maintenance of the test suites.  I think having a
commercial entity maintain it doesn't make sense until we get the
language into a more static situation.  I don't think there is
even agreement that this is the way it should work, for that
matter, but we have plenty of time to discuss this, and the situation
will be changing in the meantime.

So keep those cards and letters coming, folks!

∂12-Nov-84  1128	brown@DEC-HUDSON 	validation process    
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  11:25:11 PST
Date: Mon, 12 Nov 84 14:26:14 EST
From: brown@DEC-HUDSON
Subject: validation process
To: cl-validation@su-ai

I am happy to see that another vendor (Symbolics) is interested in sharing
tests.  I too believe we all much to gain by this kind of cooperation.

Since it seems that we will be creating and running tests, I would like
to expand a bit on an issue I raised previously - the ethics of validation.
A lot of information; either explicit or intuitive, concerning the quality
of the various implementations will surely be passed around on this mailing
list.  I believe that this information must be treated confidentially.  I
know of two recent instances when perceived bugs in our implementation of
Common Lisp were brought up in sales situations.  I can not actively
participate in these discussions unless we all intend to keep this
information private.

I disagree with the last point Bob's "Hello" mail - the long term maintenance
of the test suite (however, I agree that we have time to work this out).
I believe that our recommendation should be that ARPA immediately fund a
third party to create/maintain/administer language validation.

One big reason is to guarantee impartiality and to protect ourselves.
If Common Lisp validation becomes a requirement for software on RFPs,
big bucks might be a stake and we need guarantee that the process is
impartial and, I think, we want a lot of distance between ourselves and
the validation process.  I don't want to get sued by XYZ inc. because their
implementation didn't pass and this caused them to lose a contract and go
out of business.

Of course, if ARPA isn't willing to fund this, then we Common Lispers will
have to do something ourselves.  It would be useful if we could get
some preliminary indication from ARPA about their willingness to fund
this type effort.

∂12-Nov-84  1237	FAHLMAN@CMU-CS-C.ARPA 	validation process    
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  12:36:09 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Mon 12 Nov 84 15:35:13-EST
Date: Mon, 12 Nov 1984  15:35 EST
Message-ID: <FAHLMAN.12063043155.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To:   brown@DEC-HUDSON.ARPA
Cc:   cl-validation@SU-AI.ARPA
Subject: validation process
In-reply-to: Msg of 12 Nov 1984  14:26-EST from brown at DEC-HUDSON


I don't see how confidentiality of validation results can be maintained
when the validation suites are publically available (as they must be).
If DEC has 100 copies of its current Common Lisp release out in
customer-land, and if the validation programs are generally available to
users and manufacturers alike, how can anyone reasonably expect that
users will not find out that this release fails test number 37?  I think
that any other manufacturer had better be without sin before casting the
first stone in a sales presentation, but certainly there will be some
discussion of which implementations are fairly close and which are not.
As with benchmarks, it will take some education before the public can
properly interpret the results of such tests, and not treat the lack of
some :FROM-END option as a sin of equal magnitude to the lack of a
package system.

The only alternative that I can see is to keep the validation suite
confidential in some way, available only to manufacturers who promise to
run it on their own systems only.  I would oppose that, even if it means
that some manufacturers would refrain from contributing any tests that
their own systems would find embarassing.  It seems to me that making
the validation tests widely available is the only way to make them
widely useful as a standardization tool and as something that can be
pointed at when a contract wants to specify Common Lisp.  Of course, it
would be possible to make beta-test users agree not to release any
validation results, just as they are not supposed to release benchmarks.

I agree with Gary that we probably DO want some organization to be the
official maintainer of the validation stuff, and that this must occur
BEFORE validation starts being written into RFP's and the like.  We
would have no problem with keeping the validation stuff online here at
CMU during the preliminary development phase, but as soon as the lawyers
show up, we quit.

-- Scott

∂12-Nov-84  1947	fateman%ucbdali@Berkeley 	Re:  validation process 
Received: from UCB-VAX.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  19:47:22 PST
Received: from ucbdali.ARPA by UCB-VAX.ARPA (4.24/4.39)
	id AA10218; Mon, 12 Nov 84 19:49:39 pst
Received: by ucbdali.ARPA (4.24/4.39)
	id AA13777; Mon, 12 Nov 84 19:43:29 pst
Date: Mon, 12 Nov 84 19:43:29 pst
From: fateman%ucbdali@Berkeley (Richard Fateman)
Message-Id: <8411130343.AA13777@ucbdali.ARPA>
To: brown@DEC-HUDSON, cl-validation@su-ai
Subject: Re:  validation process

I think that confidentiality of information on this mailing list is
unattainable, regardless of its desirability.

∂13-Nov-84  0434	brown@DEC-HUDSON 	Confidentially loses  
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 13 Nov 84  04:34:11 PST
Date: Tue, 13 Nov 84 07:35:21 EST
From: brown@DEC-HUDSON
Subject: Confidentially loses
To: fahlman@cmu-cs-c
Cc: cl-validation@su-ai

I guess you are right.  I can't expect the results of public domain tests
or the communications on this mailing list to be treated confidentially.
So, I retract the issue.  I'll make that my own comments are not "sensitive".
-Gary

∂18-Dec-85  1338	PACRAIG@USC-ISIB.ARPA 	Assistance please?    
Received: from USC-ISIB.ARPA by SU-AI.ARPA with TCP; 18 Dec 85  13:36:21 PST
Date: 18 Dec 1985 11:17-PST
Sender: PACRAIG@USC-ISIB.ARPA
Subject: Assistance please?
From:  Patti Craig <PACraig@USC-ISIB.ARPA>
To: CL-VALIDATION@SU-AI.ARPA
Message-ID: <[USC-ISIB.ARPA]18-Dec-85 11:17:56.PACRAIG>

Hi,

Need some information relative to the CL-VALIDATION@SU-AI
mailing list.  Would the maintainer of same please contact
me.

Thanks,

Patti Craig
USC-Information Sciences Institute

∂12-Mar-86  2357	cfry%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU 	Validation proposal 
Received: from MC.LCS.MIT.EDU by SU-AI.ARPA with TCP; 12 Mar 86  23:56:26 PST
Received: from MOSCOW-CENTRE.AI.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 13 Mar 86 02:55-EST
Date: Thu, 13 Mar 86 02:54 EST
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Validation proposal
To: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Message-ID: <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>

We need to have a standard format for validation tests.
To do this, I suggest we hash out a design spec
before we get serious about assigning chapters to implementors.
I've constructed a system which integrates diagnostics and
hacker's documentation. I use it and it saves me time.
Based on that, here's my proposal for a design spec.

GOAL [in priority order]
   To verify that a given implementation is or is not correct CL.
   To aid the implementor in finding out the discrepancies between
      his implementation and the agreed upon standard.
   To suppliment CLtL by making the standard more prescise.
   To provide examples for future CLtLs, or at least a format
      for machine-readable examples, which will make it easier to
      verify that the examples are, in fact, correct.
   ..... the below of auxiliary importance
   To facilitate internal documentation [documenatation
      used primarily by implementaors while developing]
   To give CL programmers a suggested format for diagnostics and
      internal documentation. [I argue that every programmer of
      a medium to large program could benifit from such a facility].

RELATION of validation code to CL
   It should be part of yellow pages, not CL.

IMPLEMENTATION: DESIRABLE CHARACTERISTICS
   small amount of code
   uses a small, simple subset of CL so that:
        1. implementors can use it early in the developement cycle
        2. It will depend on little and thus be more reliable.
            [we want to test specific functions in a controlled way,
             not the code that implements the validation software.]
    We could, for example, avoid using: 
          macros, 
          complex lambda-lists,
          sequences, 
          # reader-macros, 
          non-fixnum numbers

FEATURES & USER INTERFACE:
   simple, uniform, lisp syntax

   permit an easy means to test:
     - all of CL
     - all of the functions defined in a file. 
     - all of the tests for a particular function
     - individual calls to functions.

   Allow a mechanism for designating certain calls as
      "examples" which illustrate the functionality of the
      function in question. Each such example should have
        -the call
        -the expected result [potentially an error]
        -an optional explaination string ie 
           "This call errored because the 2nd arg was not a number."

----------
Here's an example of diagnostics for a function:

(test:test 'foo
  '((test:example (= (foo 2 3) 5)  "foo returns the sum of its args.")
     ;the above is a typical call and may be used in a manual along
     ;with the documentation string of the fn
    (not (= (foo 4 5) -2))
     ;a diagnostic not worthy of being made an example of. There will
     ;generally be several to 10's of such calls.
    (test:expected-error (foo 7) "requires 2 arguments")
       ;if the expression is evaled, it should cause an error
    (test:bug (foo 3 'bar) "fails to check that 2nd arg is not a number")
      ;does not perform as it should. Such entries are a convienient place
      ;for a programmer to remind himself that the FN isn't fully debugged yet.
    (test:bug-that-crashes (foo "trash") "I've GOT to check the first arg with numberp!")
  ))

TEST is a function which sequentially processes the elements of the 
list which is its 2nd arg. If an entry is a list whose car is:
   test:example      evaluate the cadr. if result is non-nil
                     do nothing, else print a bug report.
   test:expected-error  evaluate the cadr. If it does not produce an error,
                     then print a bug report.
   test:bug          evaluate the cadr. it should return NIL or error.
                     If it returns NIL or error, print a "known" bug report.
                      otherwise print "bug fixed!" message.
                     [programmer should then edit the entry to not be wrapped in
                     a test:bug statement.]
   test:bug-that-crashes Don't eval the cadr. Just print the
                     "known bug that crashes" bug report.
  There's a bunch of other possibilities in this area, like:
  test:crash-example  don't eval the cadr, but use this in documentation
  
  Any entry without a known car, will just get evaled, if it returns nil or errors,
    print a bug report. The programmer can then fix the bug, or wrap a
   test:bug around the call to acknowledge the bug. This helps separate the
   "I've seen this bug before" cases from the "this is a new bug" cases.

With an editor that permits evaluation of expressions, [emacs and sons]
its easy to eval single calls or the whole test.
When evaluating the whole test, a summary of what went wrong can be
printed at the end of the sequence like "2 bugs found".

I find it convienient to place calls to test right below the definition
of the function that I'm testing. My source code files are about
half tests and half code. I have set up my test function such that
it checks to see if it is being called as a result of being loaded
from a file. If so, it does nothing. Our compiler is set up to
ignore calls to TEST, so they don't get into compiled files.

I have a function called TEST-FILE which reads each form in the file.
If the form is a list whose car is TEST, the form is evaled, else the
form is ignored.

Some programmers prefer to keep tests in a separate file from the
source code that they are writting. This is just fine in my implementation,
except that that a list of the source code files can't be used in
testing a whole system unless there's a simple mapping between
source file name and test file name.

Its easy to see how a function could read through a file and pull
put the examples [amoung other things].

Since the first arg to the TEST fn is mainly used to tell the user what
test is being performed, it could be a string explaining in more
detail the catagory of the below calls, ie "prerequisites-for-sequences" .

Notice that to write the TEST function itself, you need not have:
macros, &optional, &rest, or &key working, features that minimal lisps
often lack.

Obviously this proposal could use creativity of many sorts.
Our actual spec should be to make the file format, though, not
to add fancy features. Such features can vary from implementation to
implementation, which will aid evolution of automatic diagnostics and
documentation software. 
But to permit enough hooks in the file format, we need insight as to the potential
breadth of such a mechanism. Thus, new goals might also be a valuable
addition to this proposal.

FRY

∂13-Mar-86  1015	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86  10:12:38 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA03979; Thu, 13 Mar 86 10:12:11 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131812.AA03979@isi-vaxa.ARPA>
Date: 13 Mar 1986 1012-PST (Thursday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
             <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>


Christopher,

Thanks for the suggestion.  Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources.  ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.  

A single validation suite will eventually be constructed with the existing
tests as a starting point.  Therefore, we will probably not seriously consider
a standard until we have examined this extant code.  I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.

Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.

Etc.,

RB

∂13-Mar-86  1028	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86  10:28:21 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA04181; Thu, 13 Mar 86 10:27:56 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131827.AA04181@isi-vaxa.ARPA>
Date: 13 Mar 1986 1027-PST (Thursday)
To: Christopher Fry <cfry@MIT-OZ%MIT-MC.ARPA>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
             <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>


Christopher,

Thanks for the suggestion.  Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources.  ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.  

A single validation suite will eventually be constructed with the existing
tests as a starting point.  Therefore, we will probably not seriously consider
a standard until we have examined this extant code.  I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.

Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.

Etc.,

RB


P.S.
I had to change your address (see header) 'cuz for some reason our mail
handler threw up on the one given with your message.


∂17-Mar-86  0946	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 17 Mar 86  09:46:27 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA11654; Mon, 17 Mar 86 09:46:19 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603171746.AA11654@isi-vaxa.ARPA>
Date: 17 Mar 1986 0946-PST (Monday)
To: cfry%oz@MIT-MC.ARPA
Cc: cl-Validation@su-ai.arpa
Subject: Re: Validation proposal
In-Reply-To: Your message of Mon, 17 Mar 86 04:30 EST.
             <860317043024.5.CFRY@DUANE.AI.MIT.EDU>


Thanks, and I look forward to seeing your tests.  And yes, I'm sure that
interested parties will get to review the test system before its in place.

RB



------- End of Forwarded Message

∂19-Mar-86  1320	berman@isi-vaxa.ARPA 	Re: Validation Contributors 
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 19 Mar 86  13:20:08 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA08917; Wed, 19 Mar 86 13:19:50 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603192119.AA08917@isi-vaxa.ARPA>
Date: 19 Mar 1986 1319-PST (Wednesday)
To: Reidy.pasa@Xerox.COM
Cc: Reidy.pasa@Xerox.COM, berman@isi-vaxa.ARPA, CL-Validation@su-ai.ARPA
Subject: Re: Validation Contributors
In-Reply-To: Your message of 19 Mar 86 11:29 PST.
             <860319-112930-3073@Xerox>


As a matter of fact, in the end it WILL be organized parrallel to the book.
For now I'm just gathering the (often extensive) validation suites that have
been produced at various sites.  These will need to be evaluated before
assigning tasks to people who want to write some code for this.  By that time
we will also have a standard format for these tests so that this new code will
fit in with the test manager.

Send messages to CL-VALIDATION@SU-AI.ARPA rather than CL general list when
discussing this, unless it is of broader interest ofcourse.

Thanks.

RB

∂27-Mar-86  1332	berman@isi-vaxa.ARPA 	Validation Distribution Policy   
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 27 Mar 86  13:32:16 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA22595; Thu, 27 Mar 86 13:32:06 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603272132.AA22595@isi-vaxa.ARPA>
Date: 27 Mar 1986 1332-PST (Thursday)
To: CL-Validation@su-ai.arpa
Subject: Validation Distribution Policy



------- Forwarded Message

Return-Path: <OLDMAN@USC-ISI.ARPA>
Received: from USC-ISI.ARPA by isi-vaxa.ARPA (4.12/4.7)
	id AA13746; Wed, 26 Mar 86 13:35:26 pst
Date: 26 Mar 1986 16:24-EST
Sender: OLDMAN@USC-ISI.ARPA
Subject: Validation in CL
From: OLDMAN@USC-ISI.ARPA
To: berman@ISI-VAXA.ARPA
Message-Id: <[USC-ISI.ARPA]26-Mar-86 16:24:40.OLDMAN>

Yes, we have tests and a manager.  I have started the wheels
moving on getting an OK from management for us to donate them.
Is there a policy statement on how they will be used or
distributed available? ...

-- Dan Oldman

------- End of Forwarded Message

I don't recall any exact final statement of the type of access.  I remember
there was some debate on whether it should be paid for by non-contributors,
but was there any conclusion?

RB

∂29-Mar-86  0819	FAHLMAN@C.CS.CMU.EDU 	Validation Distribution Policy   
Received: from C.CS.CMU.EDU by SU-AI.ARPA with TCP; 29 Mar 86  08:19:13 PST
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Sat 29 Mar 86 11:19:51-EST
Date: Sat, 29 Mar 1986  11:19 EST
Message-ID: <FAHLMAN.12194592953.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λisi-vaxa.ARPA (Richard Berman)λ
Cc:   CL-Validation@SU-AI.ARPA
Subject: Validation Distribution Policy
In-reply-to: Msg of 27 Mar 1986  16:32-EST from berman at isi-vaxa.ARPA (Richard Berman)


    I don't recall any exact final statement of the type of access.  I remember
    there was some debate on whether it should be paid for by non-contributors,
    but was there any conclusion?

I believe that the idea that free access to the validation code be used
as an incentive to get companies to contribute was discussed at the
Boston meeting, but finally abandoned as being cumbersome, punitive, and
not necessary.  Most of the companies there agreed to contribute
whatever vailidation code they had, and/or some labor to fill any holes
in the validation suite, with the understanding that the code would be
pulled into a reasonably coherent form at ISI and then would be made
freely available to all members of the community.  This release would
not occur until a number of companies had contributed something
significant, and then the entire collection up to that point would be
made available at once.

I believe that Dick Gabriel was the first to say that his company would
participate under such a plan, and that he had a bunch of conditions
that had to be met.  If there are any not captured by the above
statement, maybe he can remind us of them.

-- Scott

∂16-Jun-86  1511	berman@isi-vaxa.ARPA 	Validation Suite  
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 16 Jun 86  15:11:47 PDT
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA19003; Mon, 16 Jun 86 15:11:38 pdt
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8606162211.AA19003@isi-vaxa.ARPA>
Date: 16 Jun 1986 1511-PDT (Monday)
To: CL-VALIDATION@su-ai.arpa
Cc: berman@isi-vaxa.ARPA
Subject: Validation Suite


Well, now that some of the contributions to the Great Validation Suite have
begun to filter in, I have been asked to make a report for broad issue on 1
July summarizing the status of all the validation contributions.

I hope this is enough time so that everything can be whipped into shape.
Please do contact me regarding the status of your validation and how its
progressing.  If I haven't yet contacted you, please send me a mesage.  You
may not be on my list.  (Also, I cannot seem to reach a few of you via network
for whatever reason).

So...

I DO need you validation contributions.

We ARE putting together a master validation suite, once more of the
contributions arrive.

Thanks.

Richard Berman
USC/ISI
(213) 822-1511

∂09-Jul-86  1213	berman@vaxa.isi.edu 	Validation Control 
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 9 Jul 86  12:09:58 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA27003; Wed, 9 Jul 86 12:09:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607091909.AA27003@vaxa.isi.edu>
Date:  9 Jul 1986 1209-PDT (Wednesday)
To: CL-Validation@SU-AI.ARPA
Cc: 
Subject: Validation Control


Well, I've got quite a goodly collection of tests from which to construct a
first pass suite.  Here's the situation:  Each set of tests (from the various
vendors) uses it's own control mechanism, usually in the form of some macro
surrounding a (set of) test(s).  Some require an error handler.

By and large all tests take a similar fo.  Each is composed of a few parts:

1.  A form to evaluate.
2.  The desired result.
3.  Some kind of text for error reporting.

Some versions give each test a unique name.

Some versions specify a test "type", e.g. evaltest means to evaluate the form,
errortest means the test should generate an error (and so the macro could
choose not to do anything with the test if no error handling is present).

What I am looking for is a simple and short proposal for how to
arrange/organize tests in the suite.  Currently I am organizing according to
sections in CLtL.  This isn't entirely sufficient, especially for some of the
changes that have been accepted since its publication.  

So what kind of control/reporting/organizing method seems good to you?

As I am already organizing this, please do not delay.  If enough inertia
builds up then whatever I happen to decide will end up as the first pass.  So
get your tickets NOW!

RB

∂22-Jul-86  1344	berman@vaxa.isi.edu 	test control  
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 22 Jul 86  13:44:07 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA16122; Tue, 22 Jul 86 13:44:01 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607222044.AA16122@vaxa.isi.edu>
Date: 22 Jul 1986 1343-PDT (Tuesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test control


I am preparing the first cut at the test suite.  Each test is wrapped in a
macro, which I propose below:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

The macro for test control should allow for the following:

1.  Contributor string.  Who wrote/contributed it.

2.  Test I.D.  In most cases this would be just the name of the function.  In
other cases it may be an identifier as to what feature is being tested, such
as SCOPING.
 
3.  Test type.  E.g. Eval, Error, Ignore, etc.

4.  N tests (or pairs of tests and expected results).

5.  Side effects testing.  With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.

6.  Test name. Unique for each test.

7.  Form to evaluate if test fails.  This may be useful later to help analyze
beyond the first order.

8.  Error string.


In number 2 above, the identifier must be selected from amongst those provided
in a database.  This database relates identifiers to section numbers (or to
some other ordering scheme) and is used by some form of test management to
schedule the sequence of testing.  This allows for automatic ordering.  For
example, all the function names are in the database, as well as such "topics"
as scoping, error detection, etc.

For now the ordering database will probably be aligned with the silver book,
but later on I expect it will be organized parallel with the language spec.

←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

Note:  I've already got the data base in some form.  What I want to know from
you as a test contributor (or potential contributor) is:  Does the above macro
provide enough information for adequate control and analysis, in you opinion?

Suggestions should be sent soon, because I'm gonna be implementing it in the
next 10 days.

Best,

RB

∂23-Jul-86  2104	NGALL@G.BBN.COM 	Re: test control  
Received: from BBNG.ARPA by SAIL.STANFORD.EDU with TCP; 23 Jul 86  21:03:48 PDT
Date: 24 Jul 1986 00:00-EDT
Sender: NGALL@G.BBN.COM
Subject: Re: test control
From: NGALL@G.BBN.COM
To: berman@ISI-VAXA.ARPA
Cc: cl-validation@SU-AI.ARPA
Message-ID: <[G.BBN.COM]24-Jul-86 00:00:45.NGALL>
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>

	
    Date: 22 Jul 1986 1343-PDT (Tuesday)
    From: berman@vaxa.isi.edu (Richard Berman)
    To: cl-validation@su-ai.arpa
    Subject: test control
    Message-ID: <8607222044.AA16122@vaxa.isi.edu>
    
    
    I am preparing the first cut at the test suite.  Each test is wrapped in a
    macro, which I propose below:
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
    
    The macro for test control should allow for the following:
    
    1.  Contributor string.  Who wrote/contributed it.
    
    2.  Test I.D.  In most cases this would be just the name of the function.  In
    other cases it may be an identifier as to what feature is being tested, such
    as SCOPING.
     
    3.  Test type.  E.g. Eval, Error, Ignore, etc.
    
    4.  N tests (or pairs of tests and expected results).
    
    5.  Side effects testing.  With each test from #4 above it should be possible
    to give n forms which must all evaluate to non-NIL.
    
    6.  Test name. Unique for each test.
    
    7.  Form to evaluate if test fails.  This may be useful later to help analyze
    beyond the first order.
    
    8.  Error string.
    
    
    In number 2 above, the identifier must be selected from amongst those provided
    in a database.  This database relates identifiers to section numbers (or to
    some other ordering scheme) and is used by some form of test management to
    schedule the sequence of testing.  This allows for automatic ordering.  For
    example, all the function names are in the database, as well as such "topics"
    as scoping, error detection, etc.
    
    For now the ordering database will probably be aligned with the silver book,
    but later on I expect it will be organized parallel with the language spec.
    
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
    
    Note:  I've already got the data base in some form.  What I want to know from
    you as a test contributor (or potential contributor) is:  Does the above macro
    provide enough information for adequate control and analysis, in you opinion?
    
    Suggestions should be sent soon, because I'm gonna be implementing it in the
    next 10 days.
    
    Best,
    
    RB
    
	      --------------------
		
How about a field that indicates which revision of CL this test
applies to?

-- Nick

∂24-Jul-86  0254	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  02:53:02 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40517; Thu 24-Jul-86 05:55:50-EDT
Date: Thu, 24 Jul 86 05:54 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>
Message-ID: <860724055418.1.CFRY@DUANE.AI.MIT.EDU>


    I am preparing the first cut at the test suite.  Each test is wrapped in a
    macro, which I propose below:
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

    The macro for test control should allow for the following:

    1.  Contributor string.  Who wrote/contributed it.
Nice to keep around. But won't you generally have a whole bunch of tests
in a file from 1 contributor? You shouldn't have to have their name
on every test.

    2.  Test I.D.  In most cases this would be just the name of the function.  In
    other cases it may be an identifier as to what feature is being tested, such
    as SCOPING.
 
    3.  Test type.  E.g. Eval, Error, Ignore, etc.
Please be more specific on what this means.

    4.  N tests (or pairs of tests and expected results).
Typically how large is N? 1, 10, 100, 1000?

    5.  Side effects testing.  With each test from #4 above it should be possible
    to give n forms which must all evaluate to non-NIL.
Particularly for a large N, side effect testing should be textually adjcent to
whatever its affecting.

    6.  Test name. Unique for each test.
This should be adjacent to test-id

    7.  Form to evaluate if test fails.  This may be useful later to help analyze
    beyond the first order.
typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
Should it be evaled for each of the N that fail?

    8.  Error string.
Similar to above?

    In number 2 above, the identifier must be selected from amongst those provided
    in a database.  This database relates identifiers to section numbers (or to
    some other ordering scheme) and is used by some form of test management to
    schedule the sequence of testing.  This allows for automatic ordering.  For
    example, all the function names are in the database, as well as such "topics"
    as scoping, error detection, etc.

    For now the ordering database will probably be aligned with the silver book,
    but later on I expect it will be organized parallel with the language spec.

    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

    Note:  I've already got the data base in some form.  What I want to know from
    you as a test contributor (or potential contributor) is:  Does the above macro
    provide enough information for adequate control and analysis, in you opinion?

    Suggestions should be sent soon, because I'm gonna be implementing it in the
    next 10 days.

Above is not only ambiguous, but too abstract to get a feel for it.
Send us several examples, both typical and those at the extreme ranges of
size and complexity. I want to see the actual syntax.

Guessing at what you mean here, it looks like its going to take someone a very
long time to make the tests in such a complex format.
And you lose potential flexibility.
My format distributes control much more locally to each form to be evaled.
And it allows for simple incremental add-ons for things you missed in the spec
the first time around. For example, the "EXPECT-ERROR" fn below is such an add-on.
It is not an integral part of the diagnostic-controller, which itself is
quite simple.

To re-iterate my plan:
There's a wrapper for a list of forms to evaluate, typically 5 to 20 forms.
Each form is evaled and if it returns NON-NIL, it passes.
Example:
(test '+
  (= (+ 2 3) 5)
  (expect-error (+ "2" "3")) ;returns T if the call to + errors
  (setq foo (+ 1 2))
  (= foo 3) ;tests side effect. The forms are expected to be evaled sequentially.
   ;anything that depends on a particular part of the environment to be "clean"
   ;before it tests something should have forms that clean it up first,
   ; like before the above call to setq you might say (makunbound 'foo)
  (progn (bar) t) ; one way of testing a form where it is expected not to error
    ;but don't care if it returns NIL or NON-NIL. If you found you were using this
    ;idiom a lot, you could write DONT-CARE trivially, as an add-on.
)

If you really wanted to delcare that a particular call tested a side-effect, or that
a particular call produced a side-effect, you could write a small wrapper fn for it,
but I'd guess that wouldn't be worth the typing. Such things should be obvious from
context.

Programmers are very reluctant to write diagnostics, so lets try to
make it as painless as possible. Maybe there could be some 
macros that would fill in certain defaults of your full-blown format.

One of the things that's so convienient about my mechanism is that
a hacker can chose to, with a normal lisp text editor, eval part of
a call, a whole call, a group of calls [by selecting the region],
a whole TEST, or via my fn "test-file" a whole file.
[I also have "test-module" functionality for a group of files.]
Having this functionality makes the diagnostics more than just
a "validation" suite. It makes it a real programming tool.
And thus it will get used more often, and the tests themselves will
get performed more often.
This will lead to MORE tests as well as MORE TESTED tests, which
also implies that hackersimplementors will have more tested implementations,
which, after all, furthers the ultimate goal of having accurate
implementations out there.

.....
Before settling on a standard format, I'd also recommend just
converting a large file of tests into the proposed format
[before implementing the code that performs the test].

This will help you feel redundancies in the format
by noticing your worn out fingers.
But it will also help you see what parts of the syntax are
hard to remember and in need of more keywords or better named
functions, or less nested parens.

If the proposed format passes this test, it can be used as the
TEST code for the TEST software itself, as well as testing CL.
If not, you didn't waste time implementing a bad spec.

Despite the volume of my comments, I'm glad you're getting
down to substantial issues on what features to include.

CFry 

∂24-Jul-86  1053	berman@vaxa.isi.edu 	Re: test control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  10:50:55 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA04581; Thu, 24 Jul 86 10:49:05 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241749.AA04581@vaxa.isi.edu>
Date: 24 Jul 1986 1049-PDT (Thursday)
To: NGALL@G.BBN.COM
Cc: cl-validation@SU-AI.ARPA, berman@ISI-VAXA.ARPA
Subject: Re: test control
In-Reply-To: Your message of 24 Jul 1986 00:00-EDT.
             <[G.BBN.COM]24-Jul-86 00:00:45.NGALL>


'Cuz the whole suite will be for a particular revision.  There will
be no tests in the suite that do not apply to the particular level/revision.

RB

∂24-Jul-86  1148	marick%turkey@gswd-vms.ARPA 	Re: test control
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Jul 86  11:22:08 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA11926; Thu, 24 Jul 86 13:20:56 CDT
Message-Id: <8607241820.AA11926@gswd-vms.ARPA>
Date: Thu, 24 Jul 86 13:20:47 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Re: test control


I have trouble visualizing what a test looks like.  Could you provide 
examples?  

Some general comments:

1.  I hope that often-unnecessary parts of a test (like contributor
string, error string, form-to-evaluate-if-test-fails) are optional.

2.  It would be nice if the test driver were useful for small-scale
regression testing.  (That is, "I've changed TREE-EQUAL.  O driver,
please run all the tests for TREE-EQUAL.")  It seems you have this in
mind, but I just wanted to reinforce any tendencies.

3.  The format of the database should be published, since people will
want to write programs that use it.

4.  It's very useful to have an easy way of specifying the predicate to
use when comparing the actual result to the expected result.
The test suite ought to come with a library of such predicates.

5.  I'd like to see a complete list of test types.  What a test type is
is a bit fuzzy, but we have at least the following:

  ordinary -- form evaluated and compared to unevaluated expected result.
	      (This is a convenience; you get tired of typing ')
  eval -- form evaluated and compared to evaluated expected result.
  fail -- doesn't run the test, just notes that there's an error.  This
          is used when an error breaks the test harness; it shouldn't 
	  appear in the distributed suite, of course, but it will be
	  useful for people using the test suite in day-to-day regression
	  testing.
  error -- the form is expected to signal an error; it fails if it does
          not.
  is-error -- if the form signals an error it passes.  If it doesn't signal
	  an error, it passes only if it matches the "expected" result.
	  We use this to make sure that some action which is defined to
	  be "is an error" produces either an error or some sensible result.
 	  It may not be appropriate for the official suite.  (Note that there
	  really should be an evaluating and a non-evaluating version.)


6.  Then you need to cross all those test types with a raft of issues
surrounding the compiler.  Like:

a. For completeness, you should run the tests interpreted, compiled with
#'COMPILE, and compiled with #'COMPILE-FILE. (What COMPILE-FILE does
might not be a strict superset of what COMPILE does.)

b. Suppose you're testing a signalled error.  What happens if the error
is detected at compile time?  (This is something like the IS-ERROR case
above: either the compile must fail or running the compiled version
should do the same thing the interpreted version does.)

c. It may be the case that compiled code does less error checking than
interpreted code.  OPTIMIZE switches can have the same effect.  So you may 
want to write tests that expect errors in interpreted code, but not in 
compiled code.  (This, again, is probably not relevant to the official test 
suite, but, again, the easier it is to tune the test suite, the happier 
implementors will be.)

6.  What does the output look like?  This test suite is going to be
huge, so it's especially important that you be able to easily find
differences between successive runs.

∂24-Jul-86  1546	berman@vaxa.isi.edu 	    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  12:38:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06466; Thu, 24 Jul 86 12:35:53 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241935.AA06466@vaxa.isi.edu>
Date: 24 Jul 1986 1235-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA
Cc: cl-validation@su-ai.arpa
Subject: 


Let me clarify.  First, I don't think this macro is used to control testing so
much as it is to help maintain the actual testing suite itself.  The testing
suite is supposed to eventually incarnate under ISI's FSD data base fascility,
as described in the proposal that I offered to one and all a short while back.

What this macro should do is allow me to build a test suite from amongst all
the tests.  With that in mind:


	1.  I hope that often-unnecessary parts of a test (like contributor
	string, error string, form-to-evaluate-if-test-fails) are optional.

Probably, except for contributor.  The others can be NIL or created from other
data.

	2.  It would be nice if the test driver were useful for small-scale
	regression testing.  (That is, "I've changed TREE-EQUAL.  O driver,
	please run all the tests for TREE-EQUAL.")  It seems you have this in
	mind, but I just wanted to reinforce any tendencies.

Sure.

	3.  The format of the database should be published, since people will
	want to write programs that use it.

Unlikely.  See above re: FSD.  It can't "be published" as it is just part of a
live environment.

	4.  It's very useful to have an easy way of specifying the predicate to
	use when comparing the actual result to the expected result.
	The test suite ought to come with a library of such predicates.

Well -- you could be a little more clear on this.  Like what?  Also, it is the
contributors who will write these tests.  I imagine that most of the time an
EQ or EQUAL type would be used, and other less typical or special purpose
predicates will probably not be useful to other contributors.

	5.  I'd like to see a complete list of test types.  What a test type is
	is a bit fuzzy, but we have at least the following:

	  ordinary -- form evaluated and compared to unevaluated expected result.
	      (This is a convenience; you get tired of typing ')
		  eval -- form evaluated and compared to evaluated expected result.
	  fail -- doesn't run the test, just notes that there's an error.  This
	          is used when an error breaks the test harness; it shouldn't 
		  appear in the distributed suite, of course, but it will be
		  useful for people using the test suite in day-to-day regression
		  testing.
	  error -- the form is expected to signal an error; it fails if it does
	          not.
	  is-error -- if the form signals an error it passes.  If it doesn't signal
		  an error, it passes only if it matches the "expected" result.
		  We use this to make sure that some action which is defined to
		  be "is an error" produces either an error or some sensible result.
 		  It may not be appropriate for the official suite.  (Note that there
		  really should be an evaluating and a non-evaluating version.)

Sounds to me like you got the idea.  These are classifications of tests used
to control the testing process.  In addition, this being a part of the
database, one could create a test suite for just certain classes of tests.

And as for compiler stuff--for now it will probably just allow you to test
each test interpreted, compiled or both (possibly not in the very first cut).
Other issues will be taken up as the suite develops.

	6.  What does the output look like?  This test suite is going to be
	huge, so it's especially important that you be able to easily find
	differences between successive runs.

Each failing test will give some kind of report, identifying the test.  As the
suite develops, more sophisticated reporting will be developed that fills the
needs of developers.  How's that for using the word "develop" too much?


RB

∂24-Jul-86  1549	berman@vaxa.isi.edu 	test control  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  13:05:59 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06785; Thu, 24 Jul 86 13:03:54 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607242003.AA06785@vaxa.isi.edu>
Date: 24 Jul 1986 1303-PDT (Thursday)
To: cfy@OZ.AI.MIT.EDU
Cc: cl-validation@su-ai.arpa
Subject: test control


Please see my message to Marick, which answers some of your questions.  As for
the others:


	    1.  Contributor string.  Who wrote/contributed it.
	Nice to keep around. But won't you generally have a whole bunch of tests
	in a file from 1 contributor? You shouldn't have to have their name
	on every test.

Nope.  The tests will be separated into the various sections of the book under
which the test best fits.  These will then be assembled into a test for that
section.  Note also Marick's comments re regression analysis.

 
	    3.  Test type.  E.g. Eval, Error, Ignore, etc.
	Please be more specific on what this means.

See Marick's comments.

	    4.  N tests (or pairs of tests and expected results).
	Typically how large is N? 1, 10, 100, 1000?

I imagine N is very small.  It should be what you could call a "testing unit"
which does enough to conclusively report success/failure of some specific
thing being tested.


	    5.  Side effects testing.  With each test from #4 above it should be possible
	    to give n forms which must all evaluate to non-NIL.
	Particularly for a large N, side effect testing should be textually adjcent to
	whatever its affecting.

Certainly would enhance readability/maintanability, etc.


	    6.  Test name. Unique for each test.
	This should be adjacent to test-id

Sure.


	    7.  Form to evaluate if test fails.  This may be useful later to help analyze
	    beyond the first order.
	typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
	Should it be evaled for each of the N that fail?

Well, each thing wrapped by this macro should be a "testing unit" as above, so
if any of N fails the remaining tests in that macro probably won't be
executed, and this form will then be evaluated.

	    8.  Error string.
	Similar to above?

Not at all.  This is what to say in the event of an error.  It is optional
because a reporting mechanism can construct a message, but for more
readability or for other reasons (as deemed useful by the test implementor) a
canned string can be printed as well.


	Above is not only ambiguous, but too abstract to get a feel for it.
	Send us several examples, both typical and those at the extreme ranges of
	size and complexity. I want to see the actual syntax.

Well, I hope this and other messages help that problem.  As for syntax - until
it is implemented, there isn't any.  If you still don't see why this data is
needed, or if it isn't clear about the "database" stuff I mentioned, please
call me.


	Guessing at what you mean here, it looks like its going to take someone a very
	long time to make the tests in such a complex format.
	And you lose potential flexibility.

I couldn't disagree more.  I have received a great deal of testing material
and this is not much more "complex" than most.  It actually allows (in
conjunction with the testing database) a far more flexible testing regimen
than any I've seen.

(As for your methodology -- it has much merit.  Perhaps my use of parts of it
are too disguised here?)

	Programmers are very reluctant to write diagnostics, so lets try to
	make it as painless as possible. Maybe there could be some 
	macros that would fill in certain defaults of your full-blown format.

Only new contributions need to be in this format.  I would expect a wise
programmer to come up with a number of ways to automate this.  I for one would
not type my company name (contributor ID) for each one.


	One of the things that's so convienient about my mechanism is that
	a hacker can chose to, with a normal lisp text editor, eval part of
	a call, a whole call, a group of calls [by selecting the region],
	a whole TEST, or via my fn "test-file" a whole file.
	[I also have "test-module" functionality for a group of files.]
	Having this functionality makes the diagnostics more than just
	a "validation" suite. It makes it a real programming tool.
	And thus it will get used more often, and the tests themselves will
	get performed more often.
	This will lead to MORE tests as well as MORE TESTED tests, which
	also implies that hackersimplementors will have more tested implementations,
	which, after all, furthers the ultimate goal of having accurate
	implementations out there.

Certainly one goal is to make the tests useful.  We hope to have an online
(via network) capability for testers to request their own test suites, as
customized as we can.  For others, a testing file can be generated.  Have you
read the ISI proposal for CL support?

	.....
	Before settling on a standard format, I'd also recommend just
	converting a large file of tests into the proposed format
	[before implementing the code that performs the test].

Am doing that now, with the CDC test suite.

	This will help you feel redundancies in the format
	by noticing your worn out fingers.
	But it will also help you see what parts of the syntax are
	hard to remember and in need of more keywords or better named
	functions, or less nested parens.

You bet.


	If the proposed format passes this test, it can be used as the
	TEST code for the TEST software itself, as well as testing CL.
	If not, you didn't waste time implementing a bad spec.

As with any large (any many smaller) systems, the test suite will go through
the various stages of incrmental development.  I'm sure we'll discard a
paradigm or two on the way.

	Despite the volume of my comments, I'm glad you're getting
	down to substantial issues on what features to include.

	CFry 

Thank you.

I hope this is helpful.

RB


∂24-Jul-86  1740	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  17:22:22 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 24 Jul 86 20:22:36-EDT
Date: Thu, 24 Jul 1986  20:22 EDT
Message-ID: <FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 24 Jul 1986  15:35-EDT from berman at vaxa.isi.edu (Richard Berman)


Maybe I should have read the earlier proposal more carefully.  This
"incarnate in FSD" business sounds scary.

I had the impression that FSD was an internal tool that you would be
using to maintain the vlaidation suite, but that the validation suite
itself would be one or more Common Lisp files that you can pass out to
people who want to test their systems.  Is that not true?  (This is
separate from the issue of whether validation is done at ISI or
elsewhere; the point is that it should be possible to release the test
suite if that's what we want to do.)  I would hope that the testing code
can be passed around without having to pass FSD around with it (unless
FSD is totally portable and public-domain).

-- Scott

∂25-Jul-86  0047	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  00:47:11 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40592; Fri 25-Jul-86 03:50:09-EDT
Date: Fri, 25 Jul 86 03:47 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607242003.AA06785@vaxa.isi.edu>
Message-ID: <860725034736.3.CFRY@DUANE.AI.MIT.EDU>

I apologize for not including the text of the messages I'm replying
to here. Since its more than one, I have a hard time integrating them.
.......

Sounds like you're basically doing the right stuff, but I still
don't see why you don't present us with an example.

You mentioned that you wouldn't have one until the implementation
was complete, then you said you were converting the CDC tests
already. ???

I surmise that ISI will be using some fancy database format that you'll have to
have some hairy hard and software to even get ASCII out of it.
But the interface to that will, I hope, be files containing
lisp expressions, that can be read with the reader and maybe even
tested by evaling them as is or with some modification.
Its this format that I'd like to see an example of.

There was a question about a published spec that you dodged.
I presume there will be a fixed format, and we'll all want to use it.

Since everybody is going to want to use certain "macros" for helping them
manipulate the stuff, can't we just standardize on those too?
To refer to the original issue,
when an implementor sends you a file, it should say just once
at the top of the file who wrote the tests, and what version of CL
they apply to. Actually a list of versions or range of versions may be more
apropriate.

Since it will be a smaller and less controversial amount of code, we can
just standardize on your implementation rather than haggle over
English descriptions, though I hope your implementation will at least
include doc strings. Will this code be Public Domain, or at least
given out to test contributors?

In a bunch of cases you refer to giving a test form and including
an expected value. The issue arises, how do you compare the two?

My mechanism just uses the full power of CL to do comparisons
in the most natural way. There are not 2 parts to a call,
there's just one. And the kind of comparison is integral with
the call ex: (eq foo foo) 
             (not (eq foo bar))
             (= 1 1.0)
             (equalp "foo" "FOO")
There are lots of comparisons, so don't try to special case each one.
When an error system is settled upon, I hope there will be an errorp fn.

Of course, this ends up testing "EQ" at the same time it tests "FOO",
but I think thats, in general unavoidable.
Anyway if EQ is broken, the implementation doesn't have much of a chance.

You said that each form of a group would be tested and when the first
one fails, you stop the test and declare that "REDUCE" or whatever is
broken. I think we can provide higher resolution than that without
much cost, ie (reduce x y z) is broken.
Such resolution will be very valuable to the bug fixer, and even
for someone evaluating the language. Since you dodged my 
question of "How big is N" by saying "very small" instead of
1 -> 5 or whatever, I can't tell what resolution your mechanism
is really going to provide.

∂25-Jul-86  1036	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  10:36:30 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA16963; Fri, 25 Jul 86 10:35:43 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251735.AA16963@vaxa.isi.edu>
Date: 25 Jul 1986 1035-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Thu, 24 Jul 1986  20:22 EDT.
             <FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>


FSD will be used to maintain a number of things relating to our support of CL.
It need not be distributed itself.  The intended use is to help order and keep
track of the various tests.  For example, there may be tests which are
questionable.  They would be in the database, but not readilly accessable for
the purposes of making a test file until they were verified.

Yes, of course it is files that will be distributed.  FSD can be used to help
create the testing files.  I did note on the proposal (which I did not author)
that ISI intends to send a "team" to do the validation at the manufacturer's
site.  Exactly why (except for official reporting) I don't know.

The test suite, as "incarnated" in FSD, will exist as a bunch of objects, each
of which represents a test and some data about the test.  There are not really
files, as such, in FSD.  

If this still sounds scary, let me know.  One of the purposes of all this is
to eventually allow network access to this database (and for other purposes).


RB

∂25-Jul-86  1051	berman@vaxa.isi.edu 	Re: test control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  10:50:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17112; Fri, 25 Jul 86 10:49:50 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251749.AA17112@vaxa.isi.edu>
Date: 25 Jul 1986 1049-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: test control
In-Reply-To: Your message of Fri, 25 Jul 86 03:47 EDT.
             <860725034736.3.CFRY@DUANE.AI.MIT.EDU>


I sort of thought the notion of a "test unit" would communicate the "N" you
refer to.  Let me be more specific.  N is 1.  But there may be more than one
form.  N here refers to the number of tests of the function/topic being
tested.  Other forms can set things up, etc.  If any form fails, it is THAT
TEST that is reported to have failed, not the entirety of the function/topic.

As for the conversion -- I am mostly working with my organizing database (the
one that will be used to help order the tests) with the CDC stuff as a test
case.

I would sure like to hear more ideas, and from others too.  I think now that I
would modify this testing macro a bit.  I think the "test" proper is in 3
parts.  A setup, the actual test form, and an un-setup.  Obviously only the
test form is required.

I do somewhat like the idea of just using a lisp-form, and if it is supposed
to return some result, just ensure it returns non-nil for "OK".  That is,
using your simpler (pred x y) where pred tests the result, x is the test form,
and y is the desired result.  I still would like to formalize it somewhat into
something that more clearly shows which is the test form and the required
result, as well as the predicate.  See some of the test classes that Marick
describes.  Not all of them care for a result, and I would like that to be
more explicit from the layout of the test text.

I am sorry you feel I am being evasive.  I could just make arbitrary
decisions, but in fact I am relaying all the information, ideas and activities
as they actually are.

RB

∂25-Jul-86  1111	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  11:10:53 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 14:10:54-EDT
Date: Fri, 25 Jul 1986  14:10 EDT
Message-ID: <FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986  13:35-EDT from berman at vaxa.isi.edu (Richard Berman)


That all sounds fine, as long as you people at ISI are able to cause FSD
to create a file that represents a portable test suite with the
parameters you specify (version of Common Lisp, what areas tested, etc.)
If people can come in over the net and produce such portable files for
their own use, so much the better.

-- Scott

∂25-Jul-86  1127	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  11:23:13 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17533; Fri, 25 Jul 86 11:22:38 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251822.AA17533@vaxa.isi.edu>
Date: 25 Jul 1986 1122-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986  14:10 EDT.
             <FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>


That's my feeling too.  By the way, when you say "versions of common lisp",
just what do you mean?  Are there officially recognized versions?  Or is all
ongoing activity still towards a version 1?

Thanks.

RB

∂25-Jul-86  1254	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  12:54:23 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 15:54:18-EDT
Date: Fri, 25 Jul 1986  15:54 EDT
Message-ID: <FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986  14:22-EDT from berman at vaxa.isi.edu (Richard Berman)


The assumption is that once we have ANSI/ISO approval for one version,
there will be updates to the standard at periodic and not-too-frequent
intervals. 

-- Scott

∂25-Jul-86  1541	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  15:40:42 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA20104; Fri, 25 Jul 86 15:39:31 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607252239.AA20104@vaxa.isi.edu>
Date: 25 Jul 1986 1539-PDT (Friday)
To: Fahlman@C.CS.CMU.EDU
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986  15:54 EDT.
             <FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>


Thanks, that clears it up for me.

RB

∂26-Jul-86  1447	marick%turkey@gswd-vms.ARPA 	Test suite 
Received: from GSWD-VMS.ARPA by SU-AI.ARPA with TCP; 26 Jul 86  14:47:39 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA15192; Sat, 26 Jul 86 16:49:06 CDT
Message-Id: <8607262149.AA15192@gswd-vms.ARPA>
Date: Sat, 26 Jul 86 16:49:02 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu
Cc: cl-validation@su-ai.arpa
In-Reply-To: berman@vaxa.isi.edu's message of 24 Jul 1986 1235-PDT (Thursday)
Subject: Test suite


Equality predicates (mostly digression on test-case syntax):

In any test, you'll have to write down the test case, the expected
results, and the way you test the expected vs. actual results.
The obvious way to do it is

	   (eq (car '(a b c)) 'a)

The way we do it (a way derived from something the DEC people put in
this mailing list a long time ago) is 

	   ( (car '(a b c)) ==> a)

Where the match predicate is implicit (EQUAL).  I like this way better
because it breaks a test down into distinct parts.  That makes it
easier, for example, to print an error message like 
"Test failed with actual result ~A instead of expected result ~A~%".  
If a test is just a lisp form, it will usually look like 
(<match-pred> <test-case> <expected-results>), but "usually" isn't enough.

Once you've got test-forms broken down into separate parts, it just
turns out to be convenient to have one of the parts be the match
function and another to be the type of the test (evaluating,
non-evaluating, error-expecting, etc.)


Compilation:

I wouldn't put off worrying about issues surrounding compilation.
We did just that, and I'm not pleased with the result.  These issues
will affect the whole structure of the test driver, I think, and
ignoring them will, I fear, either lead to throwing away the first
version or living with inadequacy.

∂28-Jul-86  1122	berman@vaxa.isi.edu 	Re: Test suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 28 Jul 86  11:21:23 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA07088; Mon, 28 Jul 86 11:19:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607281819.AA07088@vaxa.isi.edu>
Date: 28 Jul 1986 1119-PDT (Monday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa
Subject: Re: Test suite
In-Reply-To: Your message of Sat, 26 Jul 86 16:49:02 CDT.
             <8607262149.AA15192@gswd-vms.ARPA>


I agree about making the testing predicate a separate part of the test form.
This may become more useful for both analysis and test generation at some
point.

As for compilation -- in the test managers I have received, one generally has
the option of running the tests interpreted, compiled, or both.  There is not
a compile-file option as yet.  I suspect that compile-file should be its own
test, rather than a form of testing.  That is, there will undoubtably be a
mini-suite for testing just compile-file.  As well, there should be a general
sub-suite for testing all forms of compilation.  While it is ad-hoc to test
the compiler by compiling tests not intended to test the compiler, I freely
admit that more subtle bugs are likely to be revealed in this manner for the
very reason that the tests were not intended specifically for compilation.  

Also, there are implementations that only compile, such as ExperLisp.

RB

∂29-Jul-86  1220	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: test control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86  10:34:18 PDT
Received: from MACH.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 41040; Tue 29-Jul-86 03:32:12-EDT
Date: Tue, 29 Jul 86 03:31 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607251749.AA17112@vaxa.isi.edu>
Message-ID: <860729033120.1.CFRY@MACH.AI.MIT.EDU>


    I sort of thought the notion of a "test unit" would communicate the "N" you
    refer to.  Let me be more specific.  N is 1.  But there may be more than one
    form.  N here refers to the number of tests of the function/topic being
    tested.  Other forms can set things up, etc.  If any form fails, it is THAT
    TEST that is reported to have failed, not the entirety of the function/topic.
sounds good.

    I would sure like to hear more ideas, and from others too.  I think now that I
    would modify this testing macro a bit.  I think the "test" proper is in 3
    parts.  A setup, the actual test form, and an un-setup.  Obviously only the
    test form is required.
I usually consider  un-setup to be part of the "setup". 
Say if a test does (setq foo), then the next test is testing 
whether boundp works, As part of the setup I would do (makunbound 'foo).
This means that the current test will not have to rely on everybody else doing
the un-setup properly, which is probably what you have to do anyway.

If all of the unsetups work correctly, then the env should be the same before the
test as it is after, right? This is an awful lot of work your cutting out for yourself.
My proposals in general take into heavy consideration making it easy to write tests,
and making a minimal amount of the system the tests diagnostic controlling program itself
work with just a minimal amount of lisp functioning. It sounds like you're
not operating under the same constraints, but users of the validation suite will be.

    I do somewhat like the idea of just using a lisp-form, and if it is supposed
    to return some result, just ensure it returns non-nil for "OK".  That is,
    using your simpler (pred x y) where pred tests the result, x is the test form,
    and y is the desired result.  I still would like to formalize it somewhat into
    something that more clearly shows which is the test form and the required
    result, as well as the predicate.  See some of the test classes that Marick
    describes.  Not all of them care for a result, and I would like that to be
    more explicit from the layout of the test text.
Ok, I recognize that its nice to be able to find out the various parts of the test,
rather than just have this amorphous lisp exporession that's suppose to return non-nil.
Here's a modified approach that I think will satisfy both of us.
A cheap tester can just evaluate the test and expect to get non-nil.
Most forms will be of the type (pred expression expected-value).
That's pretty simple to parse for error messages and such.
For the don't-care-about-value case, have a function called:
ignore-value.
(defun ignore-value (arg)
  (eval arg)
  t)

If you really need to get explicit, have a function called:
make-test
A call looks like: 
  (make-test pred exp expected-value 
     &key test-id author site-name set-upform un-setup-form error-message compilep ...)

make-test is not quite the right word, because I think evaling it would
perform the test, not just create it. Maybe we should call it
perform-test instead.
If you realy want to give atest a name, there could be a fn
def-test whose args are the same as make-test expect that inserted at
the front is a NAME.

 Anyway the idea is that some hairy database program 
can easily go into the call and extract out all the relevent info.
[actually, its not even so hairy:
  -setup and unsetup default to NIL.
  -if non-list, pred defaults to EQUAL, expected-value defaults to non-nil
  -if list, whose car is not DEF-TEST, pred is car, 
    exp is cadr and expected-value is caddr.
  -if list whose car is DEF-TEST, parse as is obvious.]
  
But some simple program can just run it and it'll do mostly what you want.
the &key args can have appropriate defaults like *site-name* and
*test-author-name*.

My point here is lets use the lisp reader and evaluator, not construct 
a whole new language with its own syntax with "==>" infix operators, 
special names for predicates that duplicate existing cl fns, and such.
Lisp is hip! That's why we're bothering to implement it in the first place!

As for explicit error mesages, using:
"The form ~s evaled to ~s but the expected value was ~s."
Seems pretty complete to me. Nothing in my new proposal makes it hard to
implement such an error message.

    I am sorry you feel I am being evasive.  I could just make arbitrary
    decisions, but in fact I am relaying all the information, ideas and activities
    as they actually are.
Thanks for your concern. Actually I didn't think you were trying to be evasive,
its just that you didn't think that designing the syntax can often simplify
homing in on the exact functionality of the program.

.....
I haven't thought very hard about being able to use the
same test for both compiling and evaling the expression in question.
I agree with whoever said that this should be worked out.
In my above make-test call, I have a var for compilep.
This could take the values T, NIL, or :BOTH, and maybe even
default to :BOTH. 

∂29-Jul-86  1629	berman@vaxa.isi.edu 	Add to list   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86  11:11:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17440; Tue, 29 Jul 86 11:11:33 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607291811.AA17440@vaxa.isi.edu>
Date: 29 Jul 1986 1111-PDT (Tuesday)
To: CL-Validation@SU-AI.ARPA
Cc: Cornish%bravo@ti-csl@CSNET-RELAY.ARPA
Subject: Add to list


I am forwarding the message here I received to the correct person.
RB

------- Forwarded Message

Return-Path: <CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA>
Received: from CSNET-RELAY.ARPA (csnet-pdn-gw.arpa) by vaxa.isi.edu (4.12/4.7)
	id AA11007; Mon, 28 Jul 86 17:09:37 pdt
Received: from ti-csl by csnet-relay.csnet id ar02252; 28 Jul 86 19:56 EDT
Received: from Bravo (bravo.ARPA) by tilde id AA12392; Mon, 28 Jul 86 17:08:11 cdt
To: berman@vaxa.isi.edu
Cc: 
Subject:        CL Validation Mailing List
Date:           28-Jul-86 17:05:11
From: CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA
Message-Id:     <CORNISH.2731961109@Bravo>

I would like to be added to the CL Validation Suite mailing list.


------- End of Forwarded Message

∂31-Jul-86  0834	marick%turkey@gswd-vms.ARPA 	Lisp conference 
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 31 Jul 86  08:34:35 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA00287; Thu, 31 Jul 86 10:34:01 CDT
Message-Id: <8607311534.AA00287@gswd-vms.ARPA>
Date: Thu, 31 Jul 86 10:33:56 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Lisp conference


Several people interested in CL validation will be at the Lisp
conference.  Perhaps it would be a good idea if Richard Berman were to
buy us all lunch.  Failing that, perhaps we should go to lunch on our
own tab -- or othertimewise get together.

Brian Marick

∂31-Jul-86  1034	berman@vaxa.isi.edu 	Re: Lisp conference
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 31 Jul 86  10:34:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA05383; Thu, 31 Jul 86 10:33:44 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607311733.AA05383@vaxa.isi.edu>
Date: 31 Jul 1986 1033-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Re: Lisp conference
In-Reply-To: Your message of Thu, 31 Jul 86 10:33:56 CDT.
             <8607311534.AA00287@gswd-vms.ARPA>

As for Richard Berman buying Lunch - I don't know how ISI would feel about
that, but I'll check.  I am trying to prune my stay to one day, so which
should it be.  I really need to know by today if possible, or friday morning
at worst.  Based on the responses of those interested in the validation
effort, I will decide how long (and which day(s)) to stay.  

So when would y'all like to get together?

Best,

RB

∂01-Aug-86  1348	berman@vaxa.isi.edu 	Conference    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 1 Aug 86  13:48:06 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA15784; Fri, 1 Aug 86 13:47:46 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608012047.AA15784@vaxa.isi.edu>
Date:  1 Aug 1986 1347-PDT (Friday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: Conference


Hey gang, I'm going to be at the conference to meet with any and all parties
interested in the Validation effort.  I may only be around on Monday (but
Tuesday is a possibility) and I would like to meet for lunch after the morning
session.  I assume I'll be wearing some kind of ID badge to identify myself as
Richard Berman from ISI.

I'll bring along a few hardcopies of the ISI proposal outlining our intended
support activities.

I really would like to meet everyone who is working on testing implementations
and other issues like this.

See ya.

RB

∂11-Aug-86  1122	berman@vaxa.isi.edu 	Thanks   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 11 Aug 86  11:22:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02567; Mon, 11 Aug 86 11:23:02 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608111823.AA02567@vaxa.isi.edu>
Date: 11 Aug 1986 1122-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Thanks


Thanks to the folks I spoke with at the conference.  The main thing I got from
this is the concept of an ordering macro to facilitate test groups which must
execute in a specific sequence.  

I would like to know if there is any more commentary, questions, suggestions,
etc. regarding the test macro?

RB

∂13-Aug-86  1130	berman@vaxa.isi.edu 	Test Control  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 13 Aug 86  11:29:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA10629; Wed, 13 Aug 86 11:30:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608131830.AA10629@vaxa.isi.edu>
Date: 13 Aug 1986 1130-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: Test Control


On 29 July Fry proposed a control scheme including a "compilep" option which
would be T, Nil or :BOTH, possibly defaulting to :BOTH.  This would be present
for each test.

I feel that this is unnecessary because Common Lisp is supposed to yield the
same results compiled or interpreted.  At least, that is my understanding.  Is
there any intentional instances where this is not true?

Each test (or ordered series of tests) should be runnable in either form, so I
believe the control for testing compilation should be more global.

What do you think?

RB

∂19-Aug-86  0039	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Test Control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  00:39:39 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43318; Tue 19-Aug-86 03:41:06-EDT
Date: Tue, 19 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Test Control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8608131830.AA10629@vaxa.isi.edu>
Message-ID: <860819034224.7.CFRY@JONES.AI.MIT.EDU>



    On 29 July Fry proposed a control scheme including a "compilep" option which
    would be T, Nil or :BOTH, possibly defaulting to :BOTH.  This would be present
    for each test.

    I feel that this is unnecessary because Common Lisp is supposed to yield the
    same results compiled or interpreted.  At least, that is my understanding.  Is
    there any intentional instances where this is not true?
Well, modulo some recent debate, macro-expand time is different.
Effectively, macro-expand time for compiled functions is the same as definition time.
But for evaled fns, macro-expand time is the same as run time.

But basically you're right. So long as we can easily run a whole set of tests
either evaled, compiled, or both, we don't need to indicate that in each test.
The error messages should definitely say wheather the call failed in compiled or
evaled mode.

    Each test (or ordered series of tests) should be runnable in either form, so I
    believe the control for testing compilation should be more global.

    What do you think?

    RB
In my diagnostic system, I'd like to have the local control.
One reason is so that I can explicitely label a test that has a bug in it.
[and maybe only the compiled version of a call would have the bug.]

If there was a convienient syntax for declaring a test
evaled, compiled, both, or under global control [with global control being the default,
and with BOTH being the global-control's default]
then I'd make use of it.

∂19-Aug-86  1135	berman@vaxa.isi.edu 	Re: Test Control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  11:35:02 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA12279; Tue, 19 Aug 86 11:35:26 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608191835.AA12279@vaxa.isi.edu>
Date: 19 Aug 1986 1135-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Subject: Re: Test Control
In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
             <860819034224.7.CFRY@JONES.AI.MIT.EDU>


Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
specification of compiled, evaled or both (for testing), I like it.

I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
:COMPILE or :EVAL, where :GLOBAL means that the global test controller will
decide whether the test is compiled and/or evald, and the other two values are
a "compile only" or "eval only" specifier, overriding the global control.  I
don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
meaning that the test may be compiled and/or evaled.

NOTE:  I am experimenting with a macro now that includes all the best features
we have seemed to agree upon.  I am including the above feature, but naturally
it can be changed.  In a few days I will post this preliminary macro.  It is
not really a control macro, but simply defines the test in terms of the data
base.  Currently I am using generic common-lisp for this organizing macro, and
I am not using FSD.  Instead it creates a simpler database using lists, arrays
and property lists.  This database is for testing only and the actual
organizing macro may stray from pure CL because it is intended for internal
use only.  Of course, the files generated from the database will contain only
"pure" CL for testing purposes.

RB

∂20-Aug-86  0604	hpfclp!hpfcjrd!diamant@hplabs.HP.COM 	Re: Test Control 
Received: from HPLABS.HP.COM by SAIL.STANFORD.EDU with TCP; 20 Aug 86  06:03:39 PDT
Received: by hplabs.HP.COM ; Wed, 20 Aug 86 04:43:35 pdt
From: John Diamant <hpfclp!hpfcjrd!diamant@hplabs.HP.COM>
Received: from hpfcjrd.UUCP; Tue, 19 Aug 86 13:26:12
Received: by hpfcjrd; Tue, 19 Aug 86 13:26:12 mdt
Date: Tue, 19 Aug 86 13:26:12 mdt
To: cl-validation@sail.stanford.edu
Subject: Re: Test Control

> Subject: Test Control
> From: Christopher Fry <hplabs!cfry@OZ.AI.MIT.EDU>
> 
> Well, modulo some recent debate, macro-expand time is different.
> Effectively, macro-expand time for compiled functions is the same as definition time.
> But for evaled fns, macro-expand time is the same as run time.

For evaled functions, it is unspecified in Common Lisp.  This has been
discussed at great length on the CL mailing list, so I won't repeat it here,
but this is a potential source for problems in test runs.  If an implementation
chooses to handle macro expansion the way you suggest (most do), then the
semantics truly are different.  On our implementation, where we chose to
have consistent interpreter and compiler semantics with regard to
macroexpansion, any problems we encountered with expansion time were the same
whether we ran interpreted or compiled.


John Diamant
Systems Software Operation	UUCP:  {ihnp4!hpfcla,hplabs}!hpfclp!diamant
Hewlett Packard Co.		ARPA/CSNET: diamant%hpfclp@hplabs.HP.COM
Fort Collins, CO

∂21-Aug-86  1352	berman@vaxa.isi.edu 	Purpose of Test Suite   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86  13:52:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA29758; Thu, 21 Aug 86 13:53:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608212053.AA29758@vaxa.isi.edu>
Date: 21 Aug 1986 1353-PDT (Thursday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Purpose of Test Suite


Now that I have been experimenting a bit, I have come up against a question
that is a bit difficult to decide upon.  From my understanding, I am putting
together a VALIDATION suite, the purpose of which is to determine the presence
operating status of all the CL functions, variables, features, etc.

Is it also supposed to thoroughly test these things?

That is, is this same suite responsible for determining such things as correct
operation at boundary conditions?  How about esoteric interactions?  

In the test of the "+" operation, what would you include?   Obviously you want
to be sure that it works for each data type (and combination of data types)
that it is defined for.  Also you want to make sure that positive/negative is
handled, etc.  Beyond that, should it also check to see if, for example,
MOST-POSITIVE-FIXNUM + 1 causes an error?  How about (+ 1 (1-
MOST-POSITIVE-FIXNUM)) causes no error?  And so on for each of the number-type
boundaries.

RB

∂21-Aug-86  1738	FAHLMAN@C.CS.CMU.EDU 	Purpose of Test Suite  
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86  17:38:45 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 21 Aug 86 20:37:13-EDT
Date: Thu, 21 Aug 1986  20:37 EDT
Message-ID: <FAHLMAN.12232694383.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   CL-Validation@SU-AI.ARPA
Subject: Purpose of Test Suite
In-reply-to: Msg of 21 Aug 1986  16:53-EDT from berman at vaxa.isi.edu (Richard Berman)


I agree that this is supposed to be a validation suite, and not a
comprehensive debugging suite.  It should test that everything is there,
that it basically all works, and should especially stress those things
that might be the subject of misunderstandings.  It is necessary to test
whether you can add a flonum to a bignum; it is not necessary to
test a few thousand pairs of random integers to make sure that the +
operator works for all of them.

-- Scott

∂22-Aug-86  0124	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Purpose of Test Suite
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43711; Fri 22-Aug-86 01:50:01-EDT
Date: Fri, 22 Aug 86 01:49 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Purpose of Test Suite
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608212053.AA29758@vaxa.isi.edu>
Message-ID: <860822014957.9.CFRY@JONES.AI.MIT.EDU>


    Now that I have been experimenting a bit, I have come up against a question
    that is a bit difficult to decide upon.  From my understanding, I am putting
    together a VALIDATION suite, the purpose of which is to determine the presence
    operating status of all the CL functions, variables, features, etc.

    Is it also supposed to thoroughly test these things?
If there's much of a difference, we're in big trouble.
If somebody's implementation supports adding of all
integers except (+ 27491 -31200001), we can't be expected to find that out with the
validation suite.

    That is, is this same suite responsible for determining such things as correct
    operation at boundary conditions?  How about esoteric interactions?  

    In the test of the "+" operation, what would you include?   Obviously you want
    to be sure that it works for each data type (and combination of data types)
    that it is defined for.  Also you want to make sure that positive/negative is
    handled, etc.  Beyond that, should it also check to see if, for example,
    MOST-POSITIVE-FIXNUM + 1 causes an error?  How about (+ 1 (1-
    MOST-POSITIVE-FIXNUM)) causes no error?  And so on for each of the number-type
    boundaries.
I think the broader question you're asking is:
Should the validation suite simply test that things work the way they're supposed to
when they're suppose to, or should it also make sure that things DON'T WORK when they're
not suppose to work.
You can obviously expand either catagory to available memory.
For + on non-negative integers, I'd test:
(+)
(+ 0)
(+ 0 0)
(+ 2 3 4 5 6 7)
(+ nil) => should error
(+ "one") => another error case wouldn't hurt
Checking the cases using most-positive-fixnum is 
a good idea and does appear to be necessary.
It's a lot of nit-picking work, though.
I'm glad I'm in MY sandals.

∂22-Aug-86  0125	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: Test Control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43710; Fri 22-Aug-86 01:39:48-EDT
Date: Fri, 22 Aug 86 01:39 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: Test Control
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608191835.AA12279@vaxa.isi.edu>
Message-ID: <860822013939.8.CFRY@JONES.AI.MIT.EDU>

    Received: from MC.LCS.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 19 Aug 86 14:51-EDT
    Received: from SAIL.STANFORD.EDU by MC.LCS.MIT.EDU 19 Aug 86 14:48:11 EDT
    Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  11:35:02 PDT
    Received: by vaxa.isi.edu (4.12/4.7)
	    id AA12279; Tue, 19 Aug 86 11:35:26 pdt
    From: berman@vaxa.isi.edu (Richard Berman)
    Message-Id: <8608191835.AA12279@vaxa.isi.edu>
    Date: 19 Aug 1986 1135-PDT (Tuesday)
    To: CL-Validation@su-ai.arpa
    Subject: Re: Test Control
    In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
		 <860819034224.7.CFRY@JONES.AI.MIT.EDU>


    Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
    specification of compiled, evaled or both (for testing), I like it.

    I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
    :COMPILE or :EVAL, where :GLOBAL means that the global test controller will
    decide whether the test is compiled and/or evald, and the other two values are
    a "compile only" or "eval only" specifier, overriding the global control.
Almost right.
     I
    don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
    meaning that the test may be compiled and/or evaled.
Nope. :GLOBAL should mean, get the kind of testing from the global variable
     *global-test-kind* which make take on the values:
     :eval, :compile, or :both.
The question is, should the local version be able to say :compile when the global
version says :eval and visa-versa?
Maybe in that case, that test would simply not get run.
[Say, something that only works compiled, and you're running all the tests
knowing that the compiler is completely broken, so don't run any compioed tests.]
Maybe GLOBAL should have precidence?

I know you say everything should work under compiled and evaled and for
strickly VALIDATION purposes, you shouldn't need any of this.
But it would be useful if the same format for validation was
useful for code development.  For one thing, it would simply get used
more and we'd get more validation tests.
For another, it would help developers.

    NOTE:  I am experimenting with a macro now that includes all the best features
    we have seemed to agree upon.  I am including the above feature, but naturally
    it can be changed.  In a few days I will post this preliminary macro.  It is
    not really a control macro, but simply defines the test in terms of the data
    base.  Currently I am using generic common-lisp for this organizing macro, and
    I am not using FSD. 
Right on!
    Instead it creates a simpler database using lists, arrays
    and property lists.  This database is for testing only and the actual
    organizing macro may stray from pure CL because it is intended for internal
    use only.  Of course, the files generated from the database will contain only
    "pure" CL for testing purposes.
sounds good.


∂22-Aug-86  1054	berman@vaxa.isi.edu 	Re: Test Control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  10:54:33 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06074; Fri, 22 Aug 86 10:54:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608221754.AA06074@vaxa.isi.edu>
Date: 22 Aug 1986 1054-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: CL-Validation@SU-AI.ARPA
Subject: Re: Test Control
In-Reply-To: Your message of Fri, 22 Aug 86 01:39 EDT.
             <860822013939.8.CFRY@JONES.AI.MIT.EDU>


I am still not sure why :BOTH is needed.  I beleive that the purpose here is
to have individual tests be able to specify a limitation on how they may be
run.  Obviously the vast majority of tests can be run either :COMPILEd or
:EVALed.  It is only the rare test that must limit this with the inclusion of
a :EVAL or :COMPILE option.  I recommend changing these names to :EVAL-ONLY
and :COMPILE-ONLY to clarify the meanings.

The test controller could be told to run every test compile, evaled, or both.
Perhaps it would be useful to also say "run only the EVAL-ONLY tests", etc.
Does this seem useful? If not, please clarify for me just how :BOTH is
different from the union of :EVAL and :COMPILE.

Thanks

RB

∂24-Aug-86  1940	marick%turkey@gswd-vms.ARPA 	Purpose of Test Suite
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Aug 86  19:39:54 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA10093; Sun, 24 Aug 86 21:39:51 CDT
Message-Id: <8608250239.AA10093@gswd-vms.ARPA>
Date: Sun, 24 Aug 86 21:40:22 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Purpose of Test Suite


The validation suite should check that a Common Lisp system adheres to
the letter of the specification.  I don't see that that's particularly
different from any test suite.

Of course, you quickly run into combinatorial explosion, so you have to
narrow your scope.  Checking boundary conditions is known to be an
awfully productive way of testing, both because programmers often make
errors around boundaries and also because boundary condition tests can
be written quickly, without much thought.


Once the next version of the CL definition is available, it might be
useful to use it to drive the test suite.  I could see something like
this:

Each "unit" of specification would contain a pointer to the appropriate
test.  For example, the specification for #'+ will say that it takes 0
or more arguments.  That sentence will point to a test that gives #'+ 
0 arguments and Lambda-Parameters-Limit arguments (the boundary
conditions).  The FSD database ought to be able to support this.

It might also be useful to have a list of stock values to use for
testing.  Each datatype contains classes of "equivalent values", and
these stock values would be the boundary values.  For example, the stock
values for type fixnum would be most-negative-fixnum, -1, 0, +1, and
most-positive-fixnum.  In some string tests I whipped off not too long
back, I used three stock strings: a simple-string, a string with one
level of displacement, a string with two levels of displacement,
including a displacement offset and a fill-pointer. (Guess what I was
testing.)  These stock values have the advantage that they eliminate
some of the thinking required per test.  The disadvantage is that they
institutionalize gaps in your test coverage.

I don't know that this is practical at this late date.

Brian Marick




∂25-Aug-86  1221	berman@vaxa.isi.edu 	TEST MACRO    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:20:45 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02269; Mon, 25 Aug 86 12:21:34 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251921.AA02269@vaxa.isi.edu>
Date: 25 Aug 1986 1221-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: TEST MACRO


Here is the current version of the test macro stuff.  Note that this is an
organizing macro, to create the database.  The variable LIST-OF-ITEMS is not
defined here - it contains a listing of all the CL function, macro, variable
names, etc.

I am not 100% happy with the current version, and I look forward to your
suggestions.  Remember, this creates a data base.  The main requisite is that
this macro must embody all the necessary info for the management and running
of the tests.  My next message will contain some samples.



;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

(defvar *list-of-test-names* nil)

(defvar *list-of-test-seq-names* nil)

; ADD-TEST does the work of putting the test into the database.
; It doesn not do any testing.
; NOTE: This version is for testing.  It doesn not use FSD,
; but should work in any Common lisp.  See DEFTEST for
; descriptions of the arguments.

(defmacro add-test (item name type contrib$ setup testform unsetup
		    failform error$ doc$ name-add control)
  (putprop name item 'test-of) 	; note what it is a test of.
  (putprop name type 'test-type)
  (putprop name contrib$ 'test-contributor)
  (putprop name setup 'test-setup)
  (putprop name testform 'test-form)
  (putprop name unsetup 'test-unsetup)
  (putprop name failform 'test-failform)
  (putprop name error$ 'test-error$)
  (putprop name doc$ 'test-doc$)
  (putprop name control 'test-control)
  (and name-add
       (putprop item (cons name (get  item 'tests)) 'tests)
       (push name *list-of-test-names*))
  `',name)


; DEFTEST is used to define a test.  It puts the test into a database.
; The arguments are:

; ITEM which is one of the common lisp function names, variables, macro names, 
;      etc. or a subject name.  The name must be present in the organizing
;      database.

; NAME must be a unique symbol for this test.

; TYPE is optional, defaulting to ORDINARY.  It must be one of NOEVAL,
;      EVAL or ERROR.  ORDINARY means the testform eval section is 
;      evaluated and compared (using the indicated compare in the testform)
;      with the unevaluated compare section.  EVAL means both halves 
;      are evaluated and compared.  ERROR means the form should produce
;      an error.

; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
;      and ERROR test, TESTFORM is an expresion which must produce
;      an error.  Otherwise there are 3 parts.  The first is the 
;      eval form, which is evaluated.  The second is a form which
;      can be used as a function by APPLY, taking two arguments and
;      used to compare the results of the eval form with the third
;      part of the TESTFORM, the compare form.  The compare form is
;      either evalutated (type EVAL) or not (type NOEVAL).

; The remaining arguments are optional, referenced by keywords.  They are:
  
; :CONTRIB$ is a documentation string showing the originator of the test.
; If unspecified or NIL it gets its value from CL-TESTS:*CONTRIB$*

; :FAILFORM is a form to evaluate in the event that an unexpected error
;      was generated, or the comparison failed.

; :ERROR$ is a string to print out if the comparison fails.

; :SETUP is a form to evaluate before TESTFORM.

; :UNSETUP is a form to evaluate after TESTFORM.

; :DOC$ is a string documenting this test.  If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*

; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE.  If it is :GLOBAL,
;     it means that the test controller will decide when/if to eval and
;     compile the test.  If it is :EVAL, then the test will ignore 
;     controller attempts to compile it, and if it is :COMPILE the
;     controller cannot eval it.  The default is :GLOBAL.

(defvar *CONTRIB$* nil)
(defvar *DOC$* nil)

(defmacro DEFTEST ((item name &optional (type 'noeval)) testform
		   &key (contrib$ *contrib$*) (failform nil) (error$ nil) (setup nil)
		   (unsetup nil) (doc$ nil) (name-add t)(control :GLOBAL))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(memq type '(noeval eval error)))
	 (error "The test-type ~s is not one of NOEVAL, EVAL or ERROR."))
	((null(stringp contrib$))
	 (error "The contributor, ~s, must be a string." contrib$))
	((null(or (null error$) (stringp error$)))
	 (error ":ERROR$ must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((null (memq control '(:GLOBAL :EVAL :COMPILE)))
	 (error ":CONTROL must be one of :GLOBAL, :EVAL or :COMPILE."))
	((memq name *list-of-test-names*)
	 (error "The test name ~s has already been used!" name)))
  `(add-test ,item ,name ,type ,*contrib$*
	     ,setup ,testform ,unsetup  ,failform ,error$
	     ,(or doc$ *doc$*) ,name-add ,control))  ; put it on the item.


; The format for test sequences is:

; (DEFTEST-SEQ (item seq-name)
;              (((test-name <type>) testform <key-word data>)
;               ((test-name <type>) testform <key-word data>) ... )
;              :CONTRIB$ <contributor-string>
;              :SETUP <setup form>
;              :UNSETUP <unsetup form>
;              :DOC$ <documentation string>

(defmacro add-test-seq (item seq-name test-names contrib$ setup unsetup doc$)
  (putprop seq-name item 'test-seq-of)
  (putprop seq-name contrib$ 'test-seq-contributor)
  (putprop seq-name setup 'test-seq-setup)
  (putprop seq-name test-names 'test-seq-names)
  (putprop seq-name unsetup 'test-seq-unsetup)
  (putprop seq-name doc$ 'test-seq-doc$)
  (putprop item (nconc (get item 'test-seqs) (list seq-name)) 'test-seqs)
  (push seq-name *list-of-test-seq-names*)
  `',seq-name)

(defmacro add-1-seq (item a-test contrib$)
  `(deftest (,item ,@ (car a-test))
	    ,(second  a-test)
	    :contrib$ , contrib$
	    ,@ (cddr a-test)
	    :name-add nil))


(defmacro DEFTEST-SEQ ((item seq-name) test-seq
		       &key (contrib$ *contrib$*) (setup nil) (unsetup nil) (doc$ *doc$*))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(stringp contrib$))
	 (error "The contributor must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((memq seq-name *list-of-test-seq-names*)
	 (error "The test-sequence name ~s has already been used!" seq-name)))
  (let (test-names)
    (dolist (a-test test-seq)
      (setq test-names
	    (nconc test-names
		   (list (eval `(add-1-seq ,item ,a-test ,contrib$))))))
    `(add-test-seq ,item
		   ,seq-name
		   ,test-names
		   ,contrib$
		   ,setup
		   ,unsetup
		   ,doc$)))

∂25-Aug-86  1225	berman@vaxa.isi.edu 	Test-Macro examples
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:24:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02290; Mon, 25 Aug 86 12:25:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251925.AA02290@vaxa.isi.edu>
Date: 25 Aug 1986 1225-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Test-Macro examples


Here are some samples.  They are transliterated from the CDC test suite, so
please, no flames over content.

;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

;*******************************************************************

;; ACONS test.

(setq *contrib$* "CDC.  Test case written by Richard Hufford.")
(setq *doc$* nil)

(deftest
  (acons acons-1)
  ((acons 'frog 'amphibian nil) equal (frog . amphibian))
  :doc$ "ACONS to NIL")
  
(deftest
  (acons acons-2)
  ((acons 'frog
	  'amphibian
	  '((duck . bird)(goose . bird)(dog . mammal)))
   equal
   (frog . amphibian)(duck . bird)(goose . bird)(dog . mammal))
  :doc$ "acons to a-list")

(deftest
  (acons acons-3)
  ((acons 'frog nil nil) equal ((frog)))
  :doc$ "acons nil datum")

(deftest
  (acons acons-4)
  ((acons 'frog
	  '(amphibian warts webbed-feet says-ribbet)
	  nil)
   equal
   ((frog . (amphibian warts webbed-feet says-ribbet))))
  :doc "acons with list datum")

;*******************************************************************

;; ACOSH test.

(deftest-seq
  (acosh cdc-acosh-tests)
  (((acosh-1)
    ((ACOSH  1.0000) ACOSH-P   0.0000))
   ((acosh-2)
    ((ACOSH  1.0345) ACOSH-P   0.26193))
   ((acosh-3)
    ((ACOSH  1.1402) ACOSH-P   0.5235))
   ((acosh-4)
    ((ACOSH  1.3246) ACOSH-P   0.7854))
   ((acosh-5)
    ((ACOSH  1.6003) ACOSH-P   1.0472))
   ((acosh-6)
    ((ACOSH  1.9863) ACOSH-P   1.3090))
   ((acosh-7)
    ((ACOSH  2.5092) ACOSH-P   1.5708))
   ((acosh-8)
    ((ACOSH  3.2051) ACOSH-P   1.8326))
   ((acosh-9)
    ((ACOSH  4.1219) ACOSH-P   2.0944))
   ((acosh-10)
    ((ACOSH  5.3228) ACOSH-P   2.3562))
   ((acosh-11)
    ((ACOSH  6.8906) ACOSH-P   2.6180))
   ((acosh-12)
    ((ACOSH  8.9334) ACOSH-P   2.8798))
   ((acosh-13)
    ((ACOSH 11.5920) ACOSH-P   3.1416))
   ((acosh-14)
    ((ACOSH 15.0497) ACOSH-P   3.4034))
   ((acosh-15)
    ((ACOSH 19.5448) ACOSH-P   3.6652))
   ((acosh-16)
    ((ACOSH 25.3871) ACOSH-P   3.9270))
   ((acosh-17)
    ((ACOSH 32.9794) ACOSH-P   4.1888))
   ((acosh-18)
    ((ACOSH 42.8450) ACOSH-P   4.4506))
   ((acosh-19)
    ((ACOSH 55.6640) ACOSH-P   4.7124))
   ((acosh-20)
    ((ACOSH 72.3200) ACOSH-P   4.9742))
   ((acosh-21)
    ((ACOSH 93.9611) ACOSH-P   5.2360)))
  :setup (DEFUN ACOSH-P (ARG1 ARG2)
	   (PROG (RES)
		 (COND ((= ARG1 ARG2) (RETURN T))
		       ((= ARG2 0.0) (RETURN (AND (> ARG1 -1E-9)
						  (< ARG1 1E-9))))
		       (T (SETQ RES (/ ARG1 ARG2))
			  (RETURN (AND (> RES 0.9999)
				       (< RES 1.0001)))))))
  :unsetup (fmakunbound 'acosh-p)
  :contrib$ "CDC.  Test case written by BRANDON CROSS, SOFTWARE ARCHITECTURE AND ENGINEERING"
  :doc$ nil)

∂25-Aug-86  1255	berman@vaxa.isi.edu 	Purpose  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:55:26 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02551; Mon, 25 Aug 86 12:56:25 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251956.AA02551@vaxa.isi.edu>
Date: 25 Aug 1986 1256-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Purpose


>From Fahlman I get that the purpose is basically to see that the spec (or
whatever) is checked, rather than a sweepingly deep test.  Marick seems to
feel that checking "that a Common Lisp system adheres to the letter of the
specification" is not "particularly different from any test suite".  Yet it
seems that a vendor's test suite (and I have reviewed about 6 major ones now)
is more designed towards the testing of both adherance to spec and specific
areas of interest/problems in that implementation.

Marick's comments re "stock values" seems somewhat useful. Certainly adding
zero and -1 is sufficient to test the handling of both zero and -1 for
addition.  I don't then need to add -7 and 2 to test for correct handling of
negatives.  Fahlman basically said that testing the functions for each of the
data types it should handle was important.  I think that this (data type
handling) and boundary conditions pretty much sum up the nature of the
validation suite which therefore should:

    1.  Test for the presence of all Common Lisp pre-defined objects.
    2.  Test for correct definition by:
         a.  Testing for the data type (i.e. Function, Constant, etc.) of each
of these objects.
         b.  Evaluating constants and variables for correct value.
         c.  Applying functions/macros to a sufficienty broad range of
arguments so as to ascertain the functionality for each type of argument and
combination of types.

Also, a few interraction tests are in order.  By this I mean the testing of
more complex forms, and I am thinking specifically of scoping.

Obviously this test suite will not cover in any way extensions made to the
language.  I know that such things as error handling and object oriented
programming are being addressed, but so far these very important areas remain
undetermined.  Should I also make this same data base (and its corresponding
test-file making utilities, etc.) available for this vendor-specific use?  I
don't even know if I CAN do this without some kind of semi-legal hassle
because at present all contributions are public domain.  But it would be nice
to have the same test format for everything.  

As I must use FSD, I cannot easily give away the actual database stuff.  So
far it is all in straight CL, but this is only because FSD is not yet running
on the TI explorer.  This is imminent, but I will try (no promise) to keep
some kind of CL version of the database stuff around.  If it gets too complex
(which is what FSD is good at handling) I may have to cease working on a
straigh CL version.

So.............

Comments????  Is this the correct statement of the purpose and direction I
should use in putting this thing together?

Thanks.
RB

∂27-Aug-86  0041	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO 
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86  00:41:02 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44090; Wed 27-Aug-86 03:43:31-EDT
Date: Wed, 27 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608251921.AA02269@vaxa.isi.edu>
Message-ID: <860827034240.5.CFRY@JONES.AI.MIT.EDU>



    (defmacro add-test (item name type contrib$ setup testform unsetup
			failform error$ doc$ name-add control)
      (putprop name item 'test-of) 	; note what it is a test of.
      (putprop name type 'test-type)
      (putprop name contrib$ 'test-contributor)
      (putprop name setup 'test-setup)
      (putprop name testform 'test-form)
      (putprop name unsetup 'test-unsetup)
      (putprop name failform 'test-failform)
      (putprop name error$ 'test-error$)
      (putprop name doc$ 'test-doc$)
Usually doc should default to something
      (putprop name control 'test-control)
      (and name-add
	   (putprop item (cons name (get  item 'tests)) 'tests)
	   (push name *list-of-test-names*))
      `',name)



    ; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
    ;      and ERROR test, TESTFORM is an expresion which must produce
    ;      an error.  Otherwise there are 3 parts.  The first is the 
    ;      eval form, which is evaluated.  The second is a form which
    ;      can be used as a function by APPLY, taking two arguments and
    ;      used to compare the results of the eval form with the third
    ;      part of the TESTFORM, the compare form.
I prefer lisp syntax. compare form first! Then test form, then expected result.
Make it look like a function call, ie a list of 3 elements.
Infix is good for mathematicians who don't understand elegant syntax.

  The compare form is
    ;      either evalutated (type EVAL) or not (type NOEVAL).
Always evaluate it. Specify no-eval by putting a quote in front of it!
[not necessary in case its a number, string, character, keyword, etc.]


    ; :FAILFORM is a form to evaluate in the event that an unexpected error
    ;      was generated, or the comparison failed.
How about have  the default prints to *error-output* a composed message like:
"In test FROBULATOR, (foo) should have returned 2 but returned 3 instead."

    ; :ERROR$ is a string to print out if the comparison fails.
Do we need both failform and error$ ?
If the test fails, evaluate the value of :failform, which prints out the standard message.
Its rare when you'd want to do something other than the default.
Maybe it would be good to have the default behavior come from
global var *test-fail-action*, so someone could generate their own
format of reporting bugs.

    ; :SETUP is a form to evaluate before TESTFORM.

    ; :UNSETUP is a form to evaluate after TESTFORM.

    ; :DOC$ is a string documenting this test.  If not specified (or nil) it
    ; gets it value from the global variable CL-TESTS:*DOC$*
Which itself defaults to "" .

    ; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE.  If it is :GLOBAL,
    ;     it means that the test controller will decide when/if to eval and
    ;     compile the test.  If it is :EVAL, then the test will ignore 
    ;     controller attempts to compile it, and if it is :COMPILE the
    ;     controller cannot eval it.  The default is :GLOBAL.
Sounds good. Actually your names of :eval-only and :compile-only are
clearer, but just so long as we all agree upon the semantics.

    ; (DEFTEST-SEQ (item seq-name)

 I'd hope most of the time to never have to see a call to
 deftest-seq. Something should just go over a whole file
 and make it one big call to deftest-seq.
 But its nice to have for obscure cases and non-file modularity.

I notice some dollar sign suffixes in the code.
How about a DIAG package to avoid name conflicts?
Of course, the package system has to be working for you to run your
diagnostics, but ...

∂27-Aug-86  1211	berman@vaxa.isi.edu 	TEST MACRO - Fry's Comments  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86  12:11:01 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA18922; Wed, 27 Aug 86 12:11:08 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608271911.AA18922@vaxa.isi.edu>
Date: 27 Aug 1986 1211-PDT (Wednesday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: TEST MACRO - Fry's Comments


doc$ DOES default to a global value.  

As for TESTFORM -- Lisp syntax is already, with the following proviso: It must
be of the form (predicate arg1 arg2) where predicate is an object which can be
applied to arg1 and arg2.  I.e. (not(eq arg1 arg2)) is no good.  But (neq arg1
arg2) is ok.  The exception (per the comments in my code) is an ERROR type of
test.

"Always evaluate it [the compare form]".  I took my current default
from Marick (that is, the compare form is not evaluated unless you specify
EVAL) after looking over a lot of different companies' test suites.  By FAR
the vast majority of tests were of the NOEVAL variety.  This will almost
certainly stand as the default.

:FAILFORM is very different from ERROR$.  Per my original posting regarding
the macro, FAILFORM is optional (and will be rarely used at this point).  It
is to help analyze an error (or pattern of errors) further.  It is used for
testing beyond the "first order", where "first order" means simple error
testing.  For example, one may wish for a :FAILFORM to maintain a list of
tests that have failed for a later analysis.  :ERROR$ is simply a message to
print out.  Actually, it might be nice if :ERROR$ was a format string with
some kind of argument capability, but this may be dangerous in a testing
environment since FORMAT is such a hairy function.  

I like the idea of a global default *TEST-FAIL-ACTION*.  I would then add an
:IF-FAIL keyword.  This is different from :FAILFORM in that :FAILFORM is sort
of an :AFTER mix-in for the standard test-fail-action (or maybe a :BEFORE???,
any preferance?) rather than a replacement for the standard fail action.
:IF-FAIL would therefore allow one to replace the standard test-fail action,
which :FAILFORM would be the "mix-in" to the fail action.  This is a useful
separation, especially when prototyping tests where :FAILFORM may not change
at the same rate as :IF-FAIL.  I hope this paragraph is clear.  Whew.

Yeah, we'll go to :COMPILE-ONLY and :EVAL-ONLY, with the previously defined
semantics, ok?

As for DEFTEST-SEQ...it is very necessary, and came about as a direct result
of working with existing test suites.  This is used when you have auxiliary
functions, macros, variables, etc., which must exist at the the time the
sequence of tests is run.  It is not always used just for ordering tests.  For
example, in the CDC suite they have a function for comparing two numbers
within a certain tolerance which is used as part of the test for #'+.  All the
tests of #'+ use this as the predicate.  So, all the #'+ tests are wrapped in
a DEFTEST-SEQ with the definition of this predicate in the :SETUP slot.  In
this case, the actual temporal sequence of the tests is unimportant.  Another
use for DEFTEST-SEQ is when the test sequence is itself important.

Don't forget that each test will become an object in a database, and an
extraction routine will build the files which you will then load as a test
suite.  Thus with this paradigm, you MUST associate any auxilary environmental
factors as part of the relevant tests, otherwise there is no way at
file-building time to determine what predicates should be defined where.

As you said, "It's nice to have for...non-file modularity", which is exactly
the case.

As for dollar-sign suffixes -- that's a holdover from BASIC, and is short for
"string".  It isn't an attempt to avoid name conflicts.  HOWEVER...I have been
meaning to stick all this stuff in its own package anyway.

And, yeah, the package system has to be working, but...


Thanks a lot for your comments.  To summarize, the things I agree with:
Prefix syntax for TESTFORM, with the mentioned proviso.  Some kind of global
*TEST-FAIL-ACTION*.  Using the names :EVAL-ONLY and :COMPILE-ONLY.  A Package
for test stuff.  I disargree with: Always evaluating the compare form.  And,
lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
some misunderstanding of an earlier message.

Sha-Boom.

RB

∂28-Aug-86  1308	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO - Fry's Comments    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 28 Aug 86  13:08:00 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44202; Thu 28-Aug-86 16:10:56-EDT
Date: Thu, 28 Aug 86 16:09 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO - Fry's Comments
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608271911.AA18922@vaxa.isi.edu>
Message-ID: <860828160939.2.CFRY@JONES.AI.MIT.EDU>




    Thanks a lot for your comments.  To summarize, the things I agree with:
    Prefix syntax for TESTFORM, with the mentioned proviso.  Some kind of global
    *TEST-FAIL-ACTION*.  Using the names :EVAL-ONLY and :COMPILE-ONLY.  A Package
    for test stuff.
Good.
    I disargree with: Always evaluating the compare form.  And,
    lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
    some misunderstanding of an earlier message.
I think the real thrust of my arguments was just to try to cut down the number of
keyword args in this test macro, and thus make it easier to remember what's going on.
Always evaling the comparison form cuts out the :eval-compare-form, and
just having one action taken when a test fails cuts out one of
:failform or :error$. You'll be using the test stuff more than anyone so you
will have implimentors myopia disease which is:
"You can remember all this stuff because you work with it daily."
But you also have the insight from being most experienced with the problem
and have the distinct advantage of implementing the code.
Please consider us less-frequent users when you add a new and/or confusing
feature [where confusing means non-lisp like].


Fry

∂08-Sep-86  1408	berman@vaxa.isi.edu 	TEST MACRO    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 8 Sep 86  14:07:51 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA28187; Mon, 8 Sep 86 13:45:05 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609082045.AA28187@vaxa.isi.edu>
Date:  8 Sep 1986 1344-PDT (Monday)
To: CL-VAlidation@su-ai.arpa
Subject: TEST MACRO


Well, this file incorporates the changes discussed.  This will probably be the
final version, unless serious flames ensue.  After this is settled, I will
begin to develop the run-time test controller.  So...Start letting me know
your ideas.  I already have a lot of my own from looking over the
contributions, and these ideas have been incorporated into this test macro so
that enough info to drive the test controller is present.

Here we go...

;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

; This file contains the code for the test-defining macros.
; It does not contain the organizing data base.  It is 
; assumed that this organizing information is loaded prior
; to this file.  The following comes from the database creating
; code, and describes the database.



;;; Each "level" of the database, where level corresponds to a section in
;;; the book, is a vector.  For example, at the "top" level, we have 
;;; 25 elements because the book is in 25 chapters.  
;;; Each element of the array contains a cons.  The car contains a list
;;; of symbols which are present and described in exactly that level.
;;; The Cdr is an array of the next lower level, being the subparts.
;;; In chapter one, anything defined before section 1.1 is in the car
;;; of the first element of the top-level vector.  The cdr of this
;;; element is a vector with 2 elements, because section one contains
;;; section 1.1 and 1.2.  This pattern repeats, with section 1.2 having
;;; an element with its cdr being a vector of 15 elements, because section
;;; 2 has 15 subsections.

;;; In addition, each subject has the SUBJECT attribute with a T value.
;;; There is also a LIST-OF-SUBJECTS containing these atoms.

; The "subjects" above mean atoms which do not refer to CL function names,
; variables, constants, etc., but rather to concepts such as SCOPING.
; All these items (function and macro names, subjects, etc.) are kept
; on a list called *LIST-OF-ITEMS*


(in-package 'cl-tests)

(defvar *list-of-test-names* nil)

(defvar *list-of-test-seq-names* nil)

; ADD-TEST does the work of putting the test into the database.
; It does not do any testing.
; NOTE: This version is for testing.  It does not use FSD,
; but should work in any Common lisp.  See DEFTEST for
; descriptions of the arguments.

(defmacro add-test (item name type contrib$ setup testform unsetup
		    failform error$ doc$ name-add control)
  (putprop name item 'test-of) 	; note what it is a test of.
  (putprop name type 'test-type)
  (putprop name contrib$ 'test-contributor)
  (putprop name setup 'test-setup)
  (putprop name testform 'test-form)
  (putprop name unsetup 'test-unsetup)
  (putprop name failform 'test-failform)
  (putprop name error$ 'test-error$)
  (putprop name doc$ 'test-doc$)
  (putprop name control 'test-control)
  (and name-add
       (putprop item (cons name (get  item 'tests)) 'tests)
       (push name *list-of-test-names*))
  `',name)


; DEFTEST is used to define a test.  It puts the test into a database.
; The arguments are:

; ITEM which is one of the common lisp function names, variables, macro names, 
;      etc. or a subject name.  The name must be present in the organizing
;      database.

; NAME must be a unique symbol for this test.

; TYPE is optional, defaulting to ORDINARY.  It must be one of NOEVAL,
;      EVAL or ERROR.  ORDINARY means the testform eval section is 
;      evaluated and compared (using the indicated compare in the testform)
;      with the unevaluated compare section.  EVAL means both halves 
;      are evaluated and compared.  ERROR means the form should produce
;      an error.

; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
;      and ERROR test, TESTFORM is an expresion which must produce
;      an error.  Otherwise there are 3 parts.  The first is the 
;      eval form, which is evaluated.  The second is a form which
;      can be used as a function by APPLY, taking two arguments and
;      used to compare the results of the eval form with the third
;      part of the TESTFORM, the compare form.  The compare form is
;      either evalutated (type EVAL) or not (type NOEVAL).

; The remaining arguments are optional, referenced by keywords.  They are:
  
; :CONTRIB$ is a documentation string showing the originator of the test.
; If unspecified or NIL it gets its value from CL-TESTS:*CONTRIB$*

; :FAILFORM is a form to evaluate in the event that an unexpected error
;      was generated, or the comparison failed.

; :ERROR$ is a string to print out if the comparison fails.

; :SETUP is a form to evaluate before TESTFORM.

; :UNSETUP is a form to evaluate after TESTFORM.

; :DOC$ is a string documenting this test.  If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*

; :CONTROL may be any of :GLOBAL, :EVAL-ONLY or :COMPILE-ONLY.  If it is :GLOBAL,
;     it means that the test controller will decide when/if to eval and
;     compile the test.  If it is :EVAL, then the test will ignore 
;     controller attempts to compile it, and if it is :COMPILE the
;     controller cannot eval it.  The default is :GLOBAL.

(defvar *CONTRIB$* nil)
(defvar *DOC$* nil)

(defmacro DEFTEST ((item name &optional (type 'noeval)) testform
		   &key (contrib$ *contrib$*) (failform nil) (error$ nil) (setup nil)
		   (unsetup nil) (doc$ nil) (name-add t)(control :GLOBAL))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(memq type '(noeval eval error)))
	 (error "The test-type ~s is not one of NOEVAL, EVAL or ERROR."))
	((null(stringp contrib$))
	 (error "The contributor, ~s, must be a string." contrib$))
	((null(or (null error$) (stringp error$)))
	 (error ":ERROR$ must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((null (memq control '(:GLOBAL :EVAL-ONLY :COMPILE-ONLY)))
	 (error ":CONTROL must be one of :GLOBAL-ONLY, :EVAL-ONLY or :COMPILE."))
	((memq name *list-of-test-names*)
	 (error "The test name ~s has already been used!" name)))
  `(add-test ,item ,name ,type ,*contrib$*
	     ,setup ,testform ,unsetup  ,failform ,error$
	     ,(or doc$ *doc$*) ,name-add ,control))  ; put it on the item.


; The format for test sequences is:

; (DEFTEST-SEQ (item seq-name)
;              (((test-name <type>) testform <key-word data>)
;               ((test-name <type>) testform <key-word data>) ... )
;              :CONTRIB$ <contributor-string>
;              :SETUP <setup form>
;              :UNSETUP <unsetup form>
;              :DOC$ <documentation string>

(defmacro add-test-seq (item seq-name test-names contrib$ setup unsetup doc$)
  (putprop seq-name item 'test-seq-of)
  (putprop seq-name contrib$ 'test-seq-contributor)
  (putprop seq-name setup 'test-seq-setup)
  (putprop seq-name test-names 'test-seq-names)
  (putprop seq-name unsetup 'test-seq-unsetup)
  (putprop seq-name doc$ 'test-seq-doc$)
  (putprop item (nconc (get item 'test-seqs) (list seq-name)) 'test-seqs)
  (push seq-name *list-of-test-seq-names*)
  `',seq-name)

(defmacro add-1-seq (item a-test contrib$)
  `(deftest (,item ,@ (car a-test))
	    ,(second  a-test)
	    :contrib$ , contrib$
	    ,@ (cddr a-test)
	    :name-add nil))


(defmacro DEFTEST-SEQ ((item seq-name) test-seq
		       &key (contrib$ *contrib$*) (setup nil) (unsetup nil) (doc$ *doc$*))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(stringp contrib$))
	 (error "The contributor must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((memq seq-name *list-of-test-seq-names*)
	 (error "The test-sequence name ~s has already been used!" seq-name)))
  (let (test-names)
    (dolist (a-test test-seq)
      (setq test-names
	    (nconc test-names
		   (list (eval `(add-1-seq ,item ,a-test ,contrib$))))))
    `(add-test-seq ,item
		   ,seq-name
		   ,test-names
		   ,contrib$
		   ,setup
		   ,unsetup
		   ,doc$)))

∂09-Sep-86  1504	berman@vaxa.isi.edu 	Correct Test Macro 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 9 Sep 86  15:04:21 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA08143; Tue, 9 Sep 86 15:04:38 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609092204.AA08143@vaxa.isi.edu>
Date:  9 Sep 1986 1504-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Correct Test Macro


The posting of the previous day contained incorrect code for the test macro
documentation, indicating that the predicate in a testform was the second
sub-form, when in fact it is now the first.  The following is corrected:


;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

; This file contains the code for the test-defining macros.
; It does not contain the organizing data base.  It is 
; assumed that this organizing information is loaded prior
; to this file.  The following comes from the database creating
; code, and describes the database.



;;; Each "level" of the database, where level corresponds to a section in
;;; the book, is a vector.  For example, at the "top" level, we have 
;;; 25 elements because the book is in 25 chapters.  
;;; Each element of the array contains a cons.  The car contains a list
;;; of symbols which are present and described in exactly that level.
;;; The Cdr is an array of the next lower level, being the subparts.
;;; In chapter one, anything defined before section 1.1 is in the car
;;; of the first element of the top-level vector.  The cdr of this
;;; element is a vector with 2 elements, because section one contains
;;; section 1.1 and 1.2.  This pattern repeats, with section 1.2 having
;;; an element with its cdr being a vector of 15 elements, because section
;;; 2 has 15 subsections.

;;; In addition, each subject has the SUBJECT attribute with a T value.
;;; There is also a LIST-OF-SUBJECTS containing these atoms.

; The "subjects" above mean atoms which do not refer to CL function names,
; variables, constants, etc., but rather to concepts such as SCOPING.
; All these items (function and macro names, subjects, etc.) are kept
; on a list called *LIST-OF-ITEMS*


(in-package 'cl-tests)

(defvar *list-of-test-names* nil)

(defvar *list-of-test-seq-names* nil)

; ADD-TEST does the work of putting the test into the database.
; It does not do any testing.
; NOTE: This version is for testing.  It does not use FSD,
; but should work in any Common lisp.  See DEFTEST for
; descriptions of the arguments.

(defmacro add-test (item name type contrib$ setup testform unsetup
		    failform error$ doc$ name-add control)
  (putprop name item 'test-of) 	; note what it is a test of.
  (putprop name type 'test-type)
  (putprop name contrib$ 'test-contributor)
  (putprop name setup 'test-setup)
  (putprop name testform 'test-form)
  (putprop name unsetup 'test-unsetup)
  (putprop name failform 'test-failform)
  (putprop name error$ 'test-error$)
  (putprop name doc$ 'test-doc$)
  (putprop name control 'test-control)
  (and name-add
       (putprop item (cons name (get  item 'tests)) 'tests)
       (push name *list-of-test-names*))
  `',name)


; DEFTEST is used to define a test.  It puts the test into a database.
; The arguments are:

; ITEM which is one of the common lisp function names, variables, macro names, 
;      etc. or a subject name.  The name must be present in the organizing
;      database.

; NAME must be a unique symbol for this test.

; TYPE is optional, defaulting to ORDINARY.  It must be one of NOEVAL,
;      EVAL or ERROR.  ORDINARY means the testform eval section is 
;      evaluated and compared (using the indicated compare in the testform)
;      with the unevaluated compare section.  EVAL means both halves 
;      are evaluated and compared.  ERROR means the form should produce
;      an error.

; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
;      and ERROR test, TESTFORM is an expresion which must produce
;      an error.  Otherwise there are 3 parts.  The first is a form 
;      which can be used as a function by APPLY, taking two arguments
;      and used to compare the results of the eval and compare forms.
;      The second form is the eval form, which is evaluated.  
;      The compare form is either evalutated (type EVAL) or not 
;      (type NOEVAL).

; The remaining arguments are optional, referenced by keywords.  They are:
  
; :CONTRIB$ is a documentation string showing the originator of the test.
; If unspecified or NIL it gets its value from CL-TESTS:*CONTRIB$*

; :FAILFORM is a form to evaluate in the event that an unexpected error
;      was generated, or the comparison failed.

; :ERROR$ is a string to print out if the comparison fails.

; :SETUP is a form to evaluate before TESTFORM.

; :UNSETUP is a form to evaluate after TESTFORM.

; :DOC$ is a string documenting this test.  If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*

; :CONTROL may be any of :GLOBAL, :EVAL-ONLY or :COMPILE-ONLY.  If it is :GLOBAL,
;     it means that the test controller will decide when/if to eval and
;     compile the test.  If it is :EVAL, then the test will ignore 
;     controller attempts to compile it, and if it is :COMPILE the
;     controller cannot eval it.  The default is :GLOBAL.

(defvar *CONTRIB$* nil)
(defvar *DOC$* nil)

(defmacro DEFTEST ((item name &optional (type 'noeval)) testform
		   &key (contrib$ *contrib$*) (failform nil) (error$ nil) (setup nil)
		   (unsetup nil) (doc$ nil) (name-add t)(control :GLOBAL))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(memq type '(noeval eval error)))
	 (error "The test-type ~s is not one of NOEVAL, EVAL or ERROR."))
	((null(stringp contrib$))
	 (error "The contributor, ~s, must be a string." contrib$))
	((null(or (null error$) (stringp error$)))
	 (error ":ERROR$ must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((null (memq control '(:GLOBAL :EVAL-ONLY :COMPILE-ONLY)))
	 (error ":CONTROL must be one of :GLOBAL-ONLY, :EVAL-ONLY or :COMPILE."))
	((memq name *list-of-test-names*)
	 (error "The test name ~s has already been used!" name)))
  `(add-test ,item ,name ,type ,*contrib$*
	     ,setup ,testform ,unsetup  ,failform ,error$
	     ,(or doc$ *doc$*) ,name-add ,control))  ; put it on the item.


; The format for test sequences is:

; (DEFTEST-SEQ (item seq-name)
;              (((test-name <type>) testform <key-word data>)
;               ((test-name <type>) testform <key-word data>) ... )
;              :CONTRIB$ <contributor-string>
;              :SETUP <setup form>
;              :UNSETUP <unsetup form>
;              :DOC$ <documentation string>

(defmacro add-test-seq (item seq-name test-names contrib$ setup unsetup doc$)
  (putprop seq-name item 'test-seq-of)
  (putprop seq-name contrib$ 'test-seq-contributor)
  (putprop seq-name setup 'test-seq-setup)
  (putprop seq-name test-names 'test-seq-names)
  (putprop seq-name unsetup 'test-seq-unsetup)
  (putprop seq-name doc$ 'test-seq-doc$)
  (putprop item (nconc (get item 'test-seqs) (list seq-name)) 'test-seqs)
  (push seq-name *list-of-test-seq-names*)
  `',seq-name)

(defmacro add-1-seq (item a-test contrib$)
  `(deftest (,item ,@ (car a-test))
	    ,(second  a-test)
	    :contrib$ , contrib$
	    ,@ (cddr a-test)
	    :name-add nil))


(defmacro DEFTEST-SEQ ((item seq-name) test-seq
		       &key (contrib$ *contrib$*) (setup nil) (unsetup nil) (doc$ *doc$*))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(stringp contrib$))
	 (error "The contributor must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((memq seq-name *list-of-test-seq-names*)
	 (error "The test-sequence name ~s has already been used!" seq-name)))
  (let (test-names)
    (dolist (a-test test-seq)
      (setq test-names
	    (nconc test-names
		   (list (eval `(add-1-seq ,item ,a-test ,contrib$))))))
    `(add-test-seq ,item
		   ,seq-name
		   ,test-names
		   ,contrib$
		   ,setup
		   ,unsetup
		   ,doc$)))

∂12-Sep-86  1259	berman@vaxa.isi.edu 	Test Stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 12 Sep 86  12:57:18 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA01052; Fri, 12 Sep 86 12:58:24 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609121958.AA01052@vaxa.isi.edu>
Date: 12 Sep 1986 1258-PDT (Friday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Test Stuff


Well.....

No comments about the test macro, so does this imply agreement?

And...

I am working on the run-time stuff.  The most OBVIOUS snag is error catching.
I don't think this can be 100% portable at this time, so what I want is a
survey of the most basic error catching mechanisms that implementors are using
in their common-lisps.  Does everyone have errorset?  I will keep any replies
that are sent directly to me confidential.  Please do respond.  

Best,

RB

∂12-Sep-86  1431	franz!binky!layer@kim.Berkeley.EDU 	Re: Test Stuff     
Received: from [128.32.130.7] by SAIL.STANFORD.EDU with TCP; 12 Sep 86  14:31:26 PDT
Received: by kim.Berkeley.EDU (5.53/1.16)
	id AA24583; Fri, 12 Sep 86 14:31:50 PDT
Received: from binky by franz (5.5/3.14)
	id AA10810; Fri, 12 Sep 86 13:37:11 PDT
Received: by binky (4.12/3.14)
	id AA06259; Fri, 12 Sep 86 13:38:44 pdt
From: franz!binky!layer@kim.Berkeley.EDU (Kevin Layer)
Return-Path: <binky!layer>
Message-Id: <8609122038.AA06259@binky>
To: kim!vaxa.isi.edu!berman (Richard Berman)
Cc: CL-Validation@su-ai.arpa
Subject: Re: Test Stuff 
In-Reply-To: Your message of Fri, 12 Sep 86 12:58:00 PDT.
             <8609121958.AA01052@vaxa.isi.edu> 
Date: Fri, 12 Sep 86 13:38:41 PDT

Franz Inc's Common Lisp has errorset, and we are awaiting the agreed
upon standard before we go any further.

	Kevin

∂16-Sep-86  1425	berman@vaxa.isi.edu 	Running Tests 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 16 Sep 86  14:25:18 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA19503; Tue, 16 Sep 86 14:26:29 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609162126.AA19503@vaxa.isi.edu>
Date: 16 Sep 1986 1426-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Running Tests


The tests generated from the database will look nearly identical to the
DEFTEST format. The names are changed to protect the innocent.  Thus, RUNTEST
and RUNTEST-SEQ.

Oh yeah, so far my informal survey shows that it should be possible to create
an ERRSET type function using the existing stuff from current implementations.
If you think you *cannot* do so, let me know!  I am going to use ERRSET in the
test controller (so far).

Anyway, the main reason for having definition-time and run-time tests be
similar in format is simple.  Once you have the test controller (which will be
vanilla CL except for ERRSET) you can run any files of tests you want which
correspond to the required format (to be explained soon).  But, the ISI suite
will be for vanilla CL, and you will also want to test your flavored stuff.
So...you can use the same test controller, and have the advantage that if your
tests could be included in the ISI suite, they are only a short edit away from
being in the correct format.  That is, once the test format is posted and the
pre pre pre alpha run-time controller is available, you should try to write
any further tests using the DEFTEST (or RUNTEST) format.

See ya,

RB

∂16-Sep-86  1821	FAHLMAN@C.CS.CMU.EDU 	Running Tests
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 16 Sep 86  18:16:46 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Tue 16 Sep 86 20:09:21-EDT
Date: Tue, 16 Sep 1986  20:07 EDT
Message-ID: <FAHLMAN.12239504802.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   CL-Validation@SU-AI.ARPA
Subject: Running Tests
In-reply-to: Msg of 16 Sep 1986  17:26-EDT from berman at vaxa.isi.edu (Richard Berman)


It might be a good idea for you to document exactly what this ERRSET
function is supposed to do and what its syntax is.  I'm not sure that
this is the same in every Lisp in the world.

-- Scott

∂17-Sep-86  1044	berman@vaxa.isi.edu 	Re: Running Tests  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 17 Sep 86  10:44:05 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA27776; Wed, 17 Sep 86 10:44:45 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609171744.AA27776@vaxa.isi.edu>
Date: 17 Sep 1986 1044-PDT (Wednesday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: CL-Validation@SU-AI.ARPA
Subject: Re: Running Tests
In-Reply-To: Your message of Tue, 16 Sep 1986  20:07 EDT.
             <FAHLMAN.12239504802.BABYL@C.CS.CMU.EDU>




>It might be a good idea for you to document exactly what this ERRSET
>function is supposed to do and what its syntax is.  I'm not sure that
>this is the same in every Lisp in the world.
>
>-- Scott

Yeah.  So far as I'm concerned, all it need do is something like this:

(ERRSET <FORM>) where it evaluates <FORM> and returns it result in a list.
E.g. (ERRSET (+ 1 2)) returns (3).  If any error occurs while evaluating
<FORM>, then ERRSET returns NIL. 

Most implementations have an optional second argument which controls the
printing of error messages.  I think the default is to print them.

Is this sufficient?

RB

∂17-Sep-86  1437	berman@vaxa.isi.edu 	Running Tests 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 17 Sep 86  14:36:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA29741; Wed, 17 Sep 86 14:37:49 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609172137.AA29741@vaxa.isi.edu>
Date: 17 Sep 1986 1437-PDT (Wednesday)
To: CL-VALIDATION@SU-AI.ARPA
Cc: 
Subject: Running Tests


Last week I reviewed some of my ideas with a few other ISIers and got some
good suggestions.  Such as:

The ERROR$ is a format string.  It is given certain arguments, such as the
result of evaluating the EVALFORM, the COMPAREFORM (or its resultant
evaluation if not :NOEVAL), the name of the test, the result of applying
evalform and compareform with the predicate, and the contributor string.  The
order I am experimenting with is name, evalform, compareform, result,
contributor.

This makes for a useful reporting mechanism.  Fry suggested that FAILFORM
could be used for this, but I wanted a default for this sort of failure
handling (i.e. to just print out a message) as it is the most typical sort of
handling.  Note that there is a default error printing mechanism below this if
you don't want to specify your own ERROR$.

In running my *highly* experimental test controller, it immediately becomes
obvious that there are several points of possible failure.  1) When evaluating
EVALFORM, an error may be generated; 2) same thing for COMPAREFORM; 3) same
thing for applying the predicate.  That is, the application may cause an
error; 4) The predicate fails to return T.

For each of these a message is printed so that we know what stage the test was
in.  In addition, there are default values for the evalform, compareform and
result to indicate that they have not yet been reached in the test process.

The last case is interesting.  I propose to extend the definition of a test
predcate as follows:  Test predicates return T if the test was successful and
some *useful* value if not.  That is, the result should help figure out why
the predicate failed.  What say you?  This could be used very effectively in
conjunction with ERROR$.

Best,

RB

∂19-Sep-86  1334	berman@vaxa.isi.edu 	Floating Point Suite    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Sep 86  13:34:28 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA19783; Fri, 19 Sep 86 13:35:43 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609192035.AA19783@vaxa.isi.edu>
Date: 19 Sep 1986 1335-PDT (Friday)
To: CL-VALIDATION@su-ai.arpa
Cc: 
Subject: Floating Point Suite


In putting together a little mini-suite, (mostly to test my run-time stuff) I
have found an interesting problem.  Here is an example:

(add-p (+ 300 20 1 0 .1 .02 .003 .0004 .00005 .000006 (-(+ 321 0.123456)))
       -7.27595E-12)

ADD-P compares two numbers to see if they are within a certain threshold.
This is a test of floating-point add adapted from the CDC test suite.  As you
can see, the first expression should come out to 0.0, which it does on my TI.
The problem is that apparently on the CDC -7.2759e-12 is close enough to the
value produced by the first expression.  Clearly this is an unacceptable test
because it is implementation dependent.  The second expression should be 0.0
with add-p testing for an acceptable difference.

The problem is that the floating point stuff is naturally a little inacurate,
but that CL has various global values and whatnot that are supposed to allow
the programmer to know certain things about the floating point as implemented.

The solution: A more general test for floating point (or a set of predicates
to compare floating point numbers) which utilize this information to make an
implementation independent test.

A few vendors have, at one time or another, given offers of help.  I would
first like suggestions on an approach to this (I am *not* a floating-point
whiz) understanding that we are talking about testing both the floating point
correctness within the bounds given by these floating point parameters
available in CL, and the various routines which accept FP numbers.

I also would like somebody to accept the assignment of writing a general FP
test after the above is figured out.

OK?

Best,

RB

∂19-Sep-86  1901	RWK@YUKON.SCRC.Symbolics.COM 	Floating Point Suite
Received: from SCRC-YUKON.ARPA by SAIL.STANFORD.EDU with TCP; 19 Sep 86  19:01:27 PDT
Received: from WHITE-BIRD.SCRC.Symbolics.COM by YUKON.SCRC.Symbolics.COM via CHAOS with CHAOS-MAIL id 90376; Fri 19-Sep-86 22:01:04 EDT
Date: Fri, 19 Sep 86 22:01 EDT
From: Robert W. Kerns <RWK@YUKON.SCRC.Symbolics.COM>
Subject: Floating Point Suite
To: Richard Berman <berman@vaxa.isi.edu>
cc: CL-VALIDATION@SU-AI.ARPA
In-Reply-To: <8609192035.AA19783@vaxa.isi.edu>
Message-ID: <860919220125.2.RWK@WHITE-BIRD.SCRC.Symbolics.COM>

    Date: 19 Sep 1986 1335-PDT (Friday)
    From: berman@vaxa.isi.edu (Richard Berman)


    In putting together a little mini-suite, (mostly to test my run-time stuff) I
    have found an interesting problem.  Here is an example:

    (add-p (+ 300 20 1 0 .1 .02 .003 .0004 .00005 .000006 (-(+ 321 0.123456)))
	   -7.27595E-12)
Amazing.  By most standards that's a pretty large error,
for no more arithmetic than is going on there!

It's worth noting that there are two things being tested by
this test:  the reader, and the arithmetic.  It would be best
to arrange to test these separately, by carefully arranging
to get various floating-point numbers into the machine with
the only assumption being that they're represented as binary.
[No, I don't mean to hack their representation.  The binaryness
is visible in such things as (+ (- 1 (float 1/3)) (- 1 (float 2/3))),
which will give you a different answer on a base-3 machine].

This will let you test what values get read for certain critical
values.  (You should also test the printer, and especially that
what you print is the same number that you read later.)  Later,
you do arithmetic on them, and compare what you get with other
numbers you had on-line.

I'm not a floating-point wizard either, so I can't take this idea
much further.

∂22-Sep-86  0857	hpfclp!paul@hplabs.HP.COM 	Floating Point Testing 
Received: from HPLABS.HP.COM by SAIL.STANFORD.EDU with TCP; 22 Sep 86  08:56:20 PDT
Received: by hplabs.HP.COM ; Mon, 22 Sep 86 08:55:25 pdt
Date: Mon, 22 Sep 86 08:55:25 pdt
From: hpfclp!paul@hplabs.HP.COM
To: CL-Validation@su-ai.ARPA
Subject: Floating Point Testing


Here is a **SIMPLE** function that I have used to test if 2 floating point
numbers are "about" equal. The two DEFCONSTANT's should be computed from
some of the constants in CLtL for more accurate comparisons. I just wanted a
quick and dirty way to compare expressions that produced floating point
results.

     Paul Beiser
     HP Ft. Collins, CO


    (defconstant *tolerance* 1.0E-10)
    (defconstant *eps* 1.0E-14)

    (defun approx= (result approx-value) 
      ;;
      ;; See if 2 numbers are approximately =. If either of the numbers
      ;; are 0, special care must be taken to ensure the validity of the
      ;; tests. For example,
      ;;
      ;;    (sin PI/2) is mathematically equal to zero, but the library
      ;;    routine may return a result that is less than *eps* away.
      ;;
      (if (and (numberp result) (numberp approx-value))
        (cond ((and (complexp result) (complexp approx-value))
	       (and (approx= (realpart result) (realpart approx-value))
		    (approx= (imagpart result) (imagpart approx-value))))
	      ((and (zerop result) (not (zerop approx-value)))
                 (< (abs approx-value) *eps*))
              ((and (zerop approx-value) (not (zerop result)))
                 (< (abs result) *eps*))
              ((and (zerop result) (zerop approx-value))
                 t)
              (t
                (< (abs (/ (- result approx-value) result)) *tolerance*)))
        (error "Both items were not numbers.")))


∂22-Sep-86  0927	hpfclp!paul@hplabs.HP.COM 	Floating Point Tests   
Received: from HPLABS.HP.COM by SAIL.STANFORD.EDU with TCP; 22 Sep 86  09:24:46 PDT
Received: by hplabs.HP.COM ; Mon, 22 Sep 86 09:23:41 pdt
Date: Mon, 22 Sep 86 09:23:41 pdt
From: hpfclp!paul@hplabs.HP.COM
To: CL-Validation@SAIL.STANFORD.EDU
Subject: Floating Point Tests


Here is a **SIMPLE** function that I have used to test if 2 floating point
numbers are "about" equal. The two DEFCONSTANT's should be computed from
some of the constants in CLtL for more accurate comparisons. I just wanted a
quick and dirty way to compare expressions that produced floating point
results.

     Paul Beiser
     HP Ft. Collins, CO


    (defconstant *tolerance* 1.0E-10)
    (defconstant *eps* 1.0E-14)

    (defun approx= (result approx-value) 
      ;;
      ;; See if 2 numbers are approximately =. If either of the numbers
      ;; are 0, special care must be taken to ensure the validity of the
      ;; tests. For example,
      ;;
      ;;    (sin PI/2) is mathematically equal to zero, but the library
      ;;    routine may return a result that is less than *eps* away.
      ;;
      (if (and (numberp result) (numberp approx-value))
        (cond ((and (complexp result) (complexp approx-value))
	       (and (approx= (realpart result) (realpart approx-value))
		    (approx= (imagpart result) (imagpart approx-value))))
	      ((and (zerop result) (not (zerop approx-value)))
                 (< (abs approx-value) *eps*))
              ((and (zerop approx-value) (not (zerop result)))
                 (< (abs result) *eps*))
              ((and (zerop result) (zerop approx-value))
                 t)
              (t
                (< (abs (/ (- result approx-value) result)) *tolerance*)))
        (error "Both items were not numbers.")))





∂22-Sep-86  1155	fateman@renoir.Berkeley.EDU 	Re:  Floating Point Tests 
Received: from RENOIR.Berkeley.EDU by SAIL.STANFORD.EDU with TCP; 22 Sep 86  11:55:08 PDT
Received: by renoir.Berkeley.EDU (5.53/1.16)
	id AA01747; Mon, 22 Sep 86 11:55:39 PDT
Date: Mon, 22 Sep 86 11:55:39 PDT
From: fateman@renoir.Berkeley.EDU (Richard Fateman)
Message-Id: <8609221855.AA01747@renoir.Berkeley.EDU>
To: CL-Validation@sail.stanford.edu, hpfclp!paul@hplabs.hp.com
Subject: Re:  Floating Point Tests

There is a lot of material on error analysis, evaluation of elementary
functions (e.g. sine, log), floating-point arithmetic, etc.  in the open
literature. 

Rather than re-inventing and simplifying the field of error analysis from
the  collective naive point of systems implementors, lets try to
use the programs etc. in the literature.

What we do have to understand though, is what we are going to test. 
I do not see any requirements on accuracy in CLtL. Do we want to test
for accuracy?  Do we want to test for such properties as monotonicity?
(that is, if x>y, then (log x) >= (log y) ?)  Proper branch cuts?
that precision of double exceeds that of single?
Adherence to the IEEE floating point standard?


(if the last of these, there is a set of test vectors available from UCB but
frankly, this is something that the hardware vendor should test, once,
for all languages).

Are we testing rational arithmetic?



∂22-Sep-86  1330	@DESCARTES.THINK.COM:gls@AQUINAS.THINK.COM 	Re:  Floating Point Tests 
Received: from GODOT.THINK.COM by SAIL.STANFORD.EDU with TCP; 22 Sep 86  13:29:56 PDT
Received: from DESCARTES.THINK.COM by Godot.Think.COM; Mon, 22 Sep 86 16:25:15 edt
Date: Mon, 22 Sep 86 16:26 EDT
From: Guy Steele <gls@Think.COM>
Subject: Re:  Floating Point Tests
To: fateman@renoir.Berkeley.EDU, CL-Validation@sail.stanford.edu,
        hpfclp!paul@hplabs.hp.com
Cc: gls@AQUINAS
In-Reply-To: <8609221855.AA01747@renoir.Berkeley.EDU>
Message-Id: <860922162626.3.GLS@DESCARTES.THINK.COM>

I agree almost completely with Fateman's remarks.  There is no point in
reinventing the wheel; there is a lot of literature out there on
software testing in general and testing of floating-point routines in
particular.  It is also worthwhile to lay out ahead of time exactly
what is to be tested for.

My one quibble is that I believe that there is value in running the
UCB test vectors over a language implementation even though the
hardware has already passed them.  It is all too easy to make a simple
slip in the software implementation, especially when stamping out many
similar routines by making copies of source test and then tweaking
them.  It is all too easy for <= to accidentally get implemented as <
or as >=.  Tests will catch stupid things like this even while testing
for more subtle problems.

--Guy

∂23-Sep-86  1119	berman@vaxa.isi.edu 	Floating Point Tests    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 23 Sep 86  11:19:48 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA14028; Tue, 23 Sep 86 11:21:23 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609231821.AA14028@vaxa.isi.edu>
Date: 23 Sep 1986 1121-PDT (Tuesday)
To: CL-validation@su-ai.arpa
Cc: 
Subject: Floating Point Tests


It looks like my original questions were not quite answered.  I realize (a)
that it is difficult to make a "pure" test that exercises *only* the feature
being tested, (b) that floating point arithmetic is an inexact science, (c)
that there are certainly a lot of published methods for evaluating floating
point implementations, (d) that we are testing Common Lisp, not some abstract
notion of floating point arithmetic.

If there is discussion on the "correctness" of some aspect of floating point
arithmetic in CL, it is best put to the technical committee so it can be
brought up as a proper topic on the main CL mailing list.  What I am
interested in is help regarding the testing of CL, using what hooks are
available per the gray book and subsequent decisions and claifications.  I
don't really care if we are also testing the reader, because there will be a
section of the suite which fully tests the reader.  If we get strange test
results because of reader bugs, fine.

That argument can be taken ad nauseum, because if I can't use the reader 'cuz
it may not work *at all*, then don't bother running the test suite just yet.
Yes, floating point stuff is wierd, so it is a point where the reader may bog
down in particular.  

What the target is is simple:  A bunch of tests which test fully (but not
beyond that point) every feature and interaction in CL.  Can this really be
done?  No, of course not.  But without this target I imagine that the suite
would be pretty pointless.  For a first cut I would like some tests of each
function, macro, special form, constant and variable defined as part of the CL
standard.

So, what I was hoping for in the original case was some method, using just the
supplied CL values, of comparing floating point numbers so that I can tell if
they are equal "enough" according to the available information.  And of course
we *are* going to try to test the accuracy of transcendental and other
functions, so some kind of predicate is required.

I hope this will help to route the further discussions appropriately.  I am
aware that the process of writing this suite will open cans-of-worms for
discussion more broadly.  This is quite expected.  But I don't want to lose
the original thread of my point amongst this discussion.

Thanks for all the replies, and I hope we can figure out this first technical
question.

Best,

RB

∂23-Sep-86  1308	fateman@renoir.Berkeley.EDU 	Re:  Floating Point Tests 
Received: from RENOIR.Berkeley.EDU by SAIL.STANFORD.EDU with TCP; 23 Sep 86  13:08:13 PDT
Received: by renoir.Berkeley.EDU (5.53/1.16)
	id AA15712; Tue, 23 Sep 86 13:08:55 PDT
Date: Tue, 23 Sep 86 13:08:55 PDT
From: fateman@renoir.Berkeley.EDU (Richard Fateman)
Message-Id: <8609232008.AA15712@renoir.Berkeley.EDU>
To: CL-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Re:  Floating Point Tests

You say
 we *are* going to try to test the accuracy of transcendental and other
functions, so some kind of predicate is required.

Is there any current language standard that does this?

I'm not saying it would be a bad thing to test floating point accuracy,
just that it isn't traditionally part of the conformance to a language
standard. Would you disqualify an implementation based on its inaccuracy?

If we are collecting accuracy-sensitive programs for reasons other than
validation, I suggest someone translate the ELEFUNT package to Lisp.

∂23-Sep-86  1348	berman@vaxa.isi.edu 	Re:  Floating Point Tests    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 23 Sep 86  13:48:35 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA15244; Tue, 23 Sep 86 13:49:17 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609232049.AA15244@vaxa.isi.edu>
Date: 23 Sep 1986 1349-PDT (Tuesday)
To: fateman@renoir.Berkeley.EDU (Richard Fateman)
Cc: CL-validation@su-ai.arpa
Subject: Re:  Floating Point Tests
In-Reply-To: Your message of Tue, 23 Sep 86 13:08:55 PDT.
             <8609232008.AA15712@renoir.Berkeley.EDU>


>Date: Tue, 23 Sep 86 13:08:55 PDT
>From: fateman@renoir.Berkeley.EDU (Richard Fateman)
>Message-Id: <8609232008.AA15712@renoir.Berkeley.EDU>
>To: CL-validation@su-ai.arpa, berman@vaxa.isi.edu
>Subject: Re:  Floating Point Tests
>
>You say
> we *are* going to try to test the accuracy of transcendental and other
>functions, so some kind of predicate is required.
>
>Is there any current language standard that does this?
>
>I'm not saying it would be a bad thing to test floating point accuracy,
>just that it isn't traditionally part of the conformance to a language
>standard. Would you disqualify an implementation based on its inaccuracy?

What I mean is...that since we must test these functions at all, we must be
able to compare their results in some implementation independent manner.  Thus
I would like a routine that compares two floating point numbers.  If we are to
say that there is no such predicate, then any function returning a calculated
floating point number is untestable.  Is (sqrt 4) ok if it returns 17?
Exactly how far off is acceptable?  Perhaps this is a better issue the
technical body.  Or is there some place in CLtL that can clarify this?

In section 12.5 of CLtL there is mention of a "floating point cookbook" that
is recommended as a reference for implementation of the irrational and
transcendental routines.  Of course there is no requirement that this book be
followed.

So can anybody recommend some way to test these things?  There is a *correct*
answer for precision n to any legal values given to floating point routines.
Some deviation for implementation on a computer is allowable and should be
calculatable to some extent.

As for disqualifying any implementation...I don't have anything to do with
that.  The results of the test suite go the technical committee.  I would
assume that glaring innacuracies in floating point should be revealed by a
valid test suite.  

See Ya...

RB

∂24-Sep-86  0155	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: Running Tests    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 24 Sep 86  01:54:57 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 3995; Wed 24-Sep-86 04:55:57 EDT
Date: Wed, 24 Sep 86 04:55 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: Running Tests
To: berman@vaxa.isi.edu, Fahlman@C.CS.CMU.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8609171744.AA27776@vaxa.isi.edu>
Message-ID: <860924045531.1.CFRY@DUANE.AI.MIT.EDU>


    >It might be a good idea for you to document exactly what this ERRSET
    >function is supposed to do and what its syntax is.  I'm not sure that
    >this is the same in every Lisp in the world.
    >
    >-- Scott

    Yeah.  So far as I'm concerned, all it need do is something like this:

    (ERRSET <FORM>) where it evaluates <FORM> and returns it result in a list.
    E.g. (ERRSET (+ 1 2)) returns (3).  If any error occurs while evaluating
    <FORM>, then ERRSET returns NIL. 

    Most implementations have an optional second argument which controls the
    printing of error messages.  I think the default is to print them.

    Is this sufficient?
yes, but lets deviate from CL practice and spell the function reasonably.
"error-set" is a possibility but I don't see what "set" has to do with the semantics
of the function.
return-nil-if-error   says it but is pretty verbose.
not-errorp    is awkward
errorp  would work if it returned multiple values, the first being T if an error happened,
the second being the value of the expression if no error.
If error, the value of the expression would be meaningless so return NIL 
or a string of what was printed to *error-output* during the evaluation.

∂24-Sep-86  1127	berman@vaxa.isi.edu 	Re: Running Tests, ERRSET    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Sep 86  11:26:55 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA24460; Wed, 24 Sep 86 11:28:01 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609241828.AA24460@vaxa.isi.edu>
Date: 24 Sep 1986 1127-PDT (Wednesday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: Running Tests, ERRSET
In-Reply-To: Your message of Wed, 24 Sep 86 04:55 EDT.
             <860924045531.1.CFRY@DUANE.AI.MIT.EDU>


As regards the name of the function -- I don't much care.  I would rather it
didn't have to return multiple values for the sake of simplicity in the test
manager.

Any other feelings about this?  I don't mind a verbose function name because
it is used once or twice in the whole program, and is only for the test
manager.

RB

∂24-Sep-86  1633	jeff%aiva.edinburgh.ac.uk@Cs.Ucl.AC.UK 	ERRSET    
Received: from CS.UCL.AC.UK by SAIL.STANFORD.EDU with TCP; 24 Sep 86  14:15:59 PDT
Received: from aiva.edinburgh.ac.uk by 44d.Cs.Ucl.AC.UK   via Janet with NIFTP
           id a009874; 24 Sep 86 19:46 BST
From: Jeff Dalton <jeff%aiva.edinburgh.ac.uk@Cs.Ucl.AC.UK>
Date: Wed, 24 Sep 86 19:48:15 -0100
Message-Id: <4730.8609241848@aiva.ed.ac.uk>
To: cl-validation@su-ai.arpa
Subject: ERRSET
Cc: cl-error-handling@su-ai.arpa
Comment: Remailed at SU-AI after delay caused by mailing list error.

   
    >>>It might be a good idea for you to document exactly what this ERRSET
    >>>function is supposed to do and what its syntax is. 

    >>Yeah.  So far as I'm concerned, all it need do is something like this:
   
    >>(ERRSET <FORM>) where it evaluates <FORM> and returns it result in
    >>a list.  If any error occurs while evaluating <FORM>, then ERRSET
    >>returns NIL. 
    >>Most implementations have an optional second argument which controls the
    >>printing of error messages.  I think the default is to print them.
    >>Is this sufficient?

    >yes, but lets deviate from CL practice and spell the function reasonably.
    >errorp would work if it returned multiple values, the first being T
    >if an error happened, the second being the value of the expression
    >if no error.  If error, the value of the expression would be
    >meaningless so return NIL or a string of what was printed to
    >*error-output* during the evaluation.

Hummm...  Isn't this sort of thing being dealt with by the CL error-
handling proposal?  In it, there is a form IGNORE-ERRORS &body forms
that returns the value(s) returned by the last form in 'forms' if
no error (condition of type error) occurs; and nil otherwise.  The 
error message is not printed.

Is this what's needed or would something different be better?  I'm
thinking that the following things might need to change:
  .  The error message is not printed.
  .  There is no way to distinguish between an error and the last
     form returning a single value of nil.
  .  There is no indication of which error occurred.
The last two points could be handled by having IGNORE-ERRORS return
an extra first value before the ones returned by the last body form.
This value could be the condition signalled or else nil if no error.
(Note that conditions (and hence errors) are data objects in their
own right -- they're better things than message text for programs to
handle.)

Of course, the proposed error system contains more general forms that
could be used to construct whatever's needed, but, given that the
proposal contains a simple form (IGNORE-ERRORS) for the simple case,
there's no reason not to make it the right simple form.

-- Jeff

∂25-Sep-86  1335	RPG  	Floating Point Tests    
To:   cl-validation@SAIL.STANFORD.EDU 

When I was doing the benchmarking work I started to code up
some tests from Cody and Waite (Software Manual for the Elementary Functions)
to determine some performance and accuracy information about the various Lisps.
I got a MacLisp version running, which prints out a lot of information about
the floating point hardware of the machine, etc. The program was written before
people used FORMAT, so the output is stupid looking, but here is an example:

;;; Do: (sqrt-test) (arctan-test) (show-results)

(TEST OF SQRT (X * X) - X) 
(17500 RANDOM ARGUMENTS WERE TESTED IN THE INTERVAL (0.70710678 1.0)) 
(SQRT (X) WAS LARGER 1152 TIMES) 
(IT AGREED 16326 TIMES) 
(IT WAS SMALLER 0 TIMES) 
(THERE ARE 33 BASE 2 SIGNIFICANT DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 1.05350655E-8 = 2 ↑ -26.5002255 OCCURRED 
FOR X = 0.707217306) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.499774456) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 2.61463252E-9 = 2 ↑ -28.5107443) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.0) 
(TEST OF SQRT (X * X) - X) 
(17500 RANDOM ARGUMENTS WERE TESTED IN THE INTERVAL (1.0 1.41421357)) 
(SQRT (X) WAS LARGER 7530 TIMES) 
(IT AGREED 7750 TIMES) 
(IT WAS SMALLER 0 TIMES) 
(THERE ARE 33 BASE 2 SIGNIFICANT DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 1.48971613E-8 = 2 ↑ -26.0003872 OCCURRED 
FOR X = 1.0002685) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.99961281) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 8.7896637E-9 = 2 ↑ -26.761545) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.238455057) 
(TEST OF SPECIAL ARGUMENTS) 
(SQRT (*XMIN*) = SQRT (2.93873587E-39) = 5.421011E-20) 
(SQRT (1.0 - *EPSNEG*) = SQRT (1.0 - 7.4505806E-9) = 1.0) 
(SQRT (1.0) = 1.00000001) 
(SQRT (1.0 + *EPS*) = SQRT (1.0 + 7.4505806E-9) = 1.00000001) 
(SQRT (*XMAX*) = SQRT (1.70141183E+38) = 1.30438179E+19) 
(TEST OF ERROR RETURNS) 
(SQRT WILL BE CALLED WITH AN ARGUMENT OF 0.0 THIS SHOULD NOT TRIGGER AN 
ERROR) 
(SQRT RETURNED THE VALUE 0.0) 
(SQRT WILL BE CALLED WITH AN ARGUMENT OF -1.0 THIS SHOULD TRIGGER AN ERROR) 
(SQRT RETURNED THE VALUE 0.0) 
(THIS CONCLUDES THE TESTS) 
(TEST OF ARCTAN (X) VS TRUNCATED TAYLOR SERIES) 
(17500 RANDOM ARGUMENTS WERE TESTED FROM THE INTERVAL (-0.0625 0.0625)) 
(ARCTAN (X) WAS LARGER 4 TIMES) 
(IT AGREED 17472 TIMES) 
(IT WAS SMALLER 2 TIMES) 
(THERE ARE 33 SIGNIFICANT BASE 2 DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 1.3543158E-8 = 2 ↑ -26.1378605 OCCURRED 
FOR X = -0.0343970642) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.86213946) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 2.63454574E-10 = 2 ↑ -31.8217268) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.0) 
(TEST OF ARCTAN (X) VS ARCTAN (1 // 20) + ARCTAN ((X - 1 // 20) // (1 + 
X // 20))) 
(17500 RANDOM ARGUMENTS WERE TESTED FROM THE INTERVAL (0.0625 0.267949194)) 
(ARCTAN (X) WAS LARGER 2262 TIMES) 
(IT AGREED 12263 TIMES) 
(IT WAS SMALLER 2733 TIMES) 
(THERE ARE 33 SIGNIFICANT BASE 2 DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 1.48968492E-8 = 2 ↑ -26.0004175 OCCURRED 
FOR X = 0.12569189) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.99958253) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 5.965045E-9 = 2 ↑ -27.3208199) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.0) 
(TEST OF 2 * ARCTAN (X) VS ARCTAN (2X // (1 - X * X))) 
(17500 RANDOM ARGUMENTS WERE TESTED FROM THE INTERVAL (0.267949194 0.414213568)) 
(ARCTAN (X) WAS LARGER 3727 TIMES) 
(IT AGREED 11336 TIMES) 
(IT WAS SMALLER 2213 TIMES) 
(THERE ARE 33 SIGNIFICANT BASE 2 DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 2.84573352E-8 = 2 ↑ -25.066624 OCCURRED 
FOR X = 0.26796681) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 1.93337607) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 7.6160271E-9 = 2 ↑ -26.9683142) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 0.031685829) 
(17500 RANDOM ARGUMENTS WERE TESTED FROM THE INTERVAL (0.414213568 1.0)) 
(ARCTAN (X) WAS LARGER 5633 TIMES) 
(IT AGREED 11504 TIMES) 
(IT WAS SMALLER 141 TIMES) 
(THERE ARE 33 SIGNIFICANT BASE 2 DIGITS IN A FLOATING-POINT NUMBER) 
(THE MAXIMUM RELATIVE ERROR OF 1.99999665 = 2 ↑ 0.99999757 OCCURRED FOR 
X = 1.00000264) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 27.9999976) 
(THE ROOT MEAN SQUARE RELATIVE ERROR WAS 0.0223606422 = 2 ↑ -5.4828946) 
(THE ESTIMATED LOSS OF BASE 2 SIGNIFICANT DIGITS IS 21.5171053) 
(SPECIAL TESTS) 
(THE IDENTITY: ARCTAN (-X) = -ARCTAN (X) WILL BE TESTED) 
(X : F (X) + F (-X)) 
(4.1264082 : 0.0) 
(0.80102554 : 0.0) 
(0.128191695 : 0.0) 
(1.02396178 : 0.0) 
(2.99522245 : 0.0) 
(THE IDENTITY ARCTAN (X) = X FOR X SMALL WILL BE TESTED) 
(X : X - F (X)) 
(6.56069255E-9 : 0.0) 
(3.28034627E-9 : 0.0) 
(1.64017314E-9 : 0.0) 
(8.2008656E-10 : 0.0) 
(4.1004328E-10 : 0.0) 
(THE IDENTITY ARCTAN (X // Y) = ARCTAN2 (X Y) WILL BE TESTED) 
(THE FIRST COLUMN OF RESULTS SHOULD BE 0 AND THE SECOND SHOULD BE +-π) 
(X : Y : F1 (X // Y) - F2 (X Y) : F1 (X // Y) - F2 (X // -Y)) 
(-1.7193766 : 0.76948133 : 0.0 : 3.14159265) 
(-1.2593356 : 0.145762308 : 0.0 : 3.14159265) 
(-1.11884652 : 0.5360462 : 0.0 : 3.14159265) 
(-1.97689867 : 0.72191647 : 0.0 : 3.14159265) 
(-1.04176486 : 0.944848426 : 0.0 : 3.14159265) 
(TEST OF VERY SMALL ARGUMENT) 
(ARCTAN (1.2621776E-29) = 1.2621776E-29) 
(TEST OF ERROR RETURNS) 
(ARCTAN WILL BE CALLED WITH THE ARGUMENT 1.70141183E+38) 
(THIS SHOULD NOT TRIGGER AN ERROR MESSAGE) 
(ARCTAN (1.70141183E+38) = 1.57079633) 
(ARCTAN2 WILL BE CALLED WITH THE ARGUMENTS 1.0 0.0) 
(THIS SHOULD NOT TRIGGER AN ERROR MESSAGE) 
(ARCTAN2 (1.0 0.0) = 1.57079633) 
(ARCTAN2 WILL BE CALLED WITH THE ARGUMENTS 2.93873587E-39 1.70141183E+38) 
(THIS SHOULD NOT TRIGGER AN ERROR MESSAGE) 
(ARCTAN2 (2.93873587E-39 1.70141183E+38) = 1.57079633) 
(ARCTAN2 WILL BE CALLED WITH THE ARGUMENTS 1.70141183E+38 2.93873587E-39) 
(THIS SHOULD NOT TRIGGER AN ERROR MESSAGE) 
(ARCTAN2 (1.70141183E+38 2.93873587E-39) = 0.0) 
(ARCTAN2 WILL BE CALLED WITH THE ARGUMENTS 0.0 0.0) 
(THIS SHOULD TRIGGER AN ERROR MESSAGE) 
(ARCTAN2 (0.0 0.0) = 0.0) 
(THIS CONCLUDES THE TESTS) 

I will send the code to Richard Berman for his enjoyment.

			-rpg-

∂25-Sep-86  1458	@WAIKATO.S4CC.Symbolics.COM:KMP@STONY-BROOK.SCRC.Symbolics.COM 	The straight story on ERRSET and related topics    
Received: from [128.81.51.90] by SAIL.STANFORD.EDU with TCP; 25 Sep 86  14:54:31 PDT
Received: from RIO-DE-JANEIRO.SCRC.Symbolics.COM by WAIKATO.S4CC.Symbolics.COM via CHAOS with CHAOS-MAIL id 61431; Thu 25-Sep-86 17:51:11 EDT
Date: Thu, 25 Sep 86 17:50 EDT
From: Kent M Pitman <KMP@SCRC-STONY-BROOK.ARPA>
Subject: The straight story on ERRSET and related topics
To: dfm@JASPER.PALLADIAN.COM
cc: CL-ERROR-HANDLING@SU-AI.ARPA, CL-VALIDATION@SU-AI.ARPA
In-Reply-To: <860925124537.1.DFM@WHITBY.Palladian.COM>
References: <4730.8609241848@aiva.ed.ac.uk>
Message-ID: <860925175039.3.KMP@RIO-DE-JANEIRO.SCRC.Symbolics.COM>

ERRSET is an antiquated concept exactly because of the issue you raise.
It blurs the issue of control information and data information by having
a single exit point and trying to pass information at that exit point
which tells you which of two possible branches you might want to take.

IGNORE-ERRORS is the approximate analog of ERRSET in the error proposal
as it stands now, but the intent is not to solve all of ERRSET's problems.
The problems are there no matter how you define things so long as you have
to communicate failure information as data.

CONDITION-CASE (and more generally CONDITION-BIND) allow transfer of 
control to any of a number of continuations based on whether an error
occurs and on the characteristics of that error. This functionality 
has been used extensively in the entire family of Lisp Machine lisps 
and has proved quite effective. It provides the necessary flexibility
without getting caught up in issues such as the one you mentioned that
would indeed be caused if all it did was set itself up to lose in 
some cases such as the one you mention when the maximum return values
are already being returned.

By the way, unlike several useful extensions which have been traditionally
been available on the LispM only, the error system doesn't depend on any
special hardware that makes it impractical on other systems, as I think
the sample implementation I announced a while back demonstrates. It's more
just a matter of cultural compatibility that has until now kept this
kind of facility on the Lisp Machine.

-- The rest of this message is an aside to the CL-VALIDATION folks, who've
-- likely not followed the discussions on CL-ERROR-HANDLING.

It's important to understand that things like ERRSET and IGNORE-ERRORS
are likely to sometimes be just too blunt to treat each of the variety
of errors that might occur in an appropriate way. I am working under the
assumption that we will be able to provide you with something which is
much more flexible and much more what you'll need toward this end.

The discussion of IGNORE-ERRORS and CONDITION-CASE above are in
reference at Error Proposal #8. It's not an approved document and is not
even up for a vote, so probably shouldn't be called a "proposal".  It's
just the latest working draft that the group has available to make
comments on.  A large number of comments have been made which will have
to be addressed in a subsequent draft, so while technical commentary on
this draft is quite appropriate, and people should play with the spec
as written, no one should assume that anything in it is cast in concrete.

If you haven't seen error proposal #8, you might FTP the following two 
files:
 MIT-AI: COMMON; EPROP8 TEXT
 MIT-AI: COMMON; EPROP8 LISP
(Randy Parker at Palladian may already have gotten you a copy, dfm.)

The sample implementation is not intended for production work, but I
think that (modulo any out-and-out bugs) it implements more or less all
of the essential functional behavior of this particular proposal. Note
well, however, that only the spec is the document and the code is only
for illustrative purposes. Nothing in the code has any weight in the
standard itself. You're welcome (and encouraged) to try to code up your 
own implementation with or without looking at mine, and to submit any
gripes to CL-ERROR-HANDLING.

You can be added to CL-ERROR-HANDLING by sending mail to RPG@SU-AI
asking to be added. I'm assuming that people on CL-VALIDATION who worry
about errors are already on CL-ERROR-HANDLING or will now ask to be
added, so please let's move any further discussion back off to 
CL-ERROR-HANDLING and not cc CL-VALIDATION unless something new comes
up which would be of interest to all of them so their mailboxes are
not needlessly cluttered.

∂26-Sep-86  1203	berman@vaxa.isi.edu 	Re: The straight story on ERRSET and related topics   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 26 Sep 86  12:03:36 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA13283; Fri, 26 Sep 86 12:00:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609261900.AA13283@vaxa.isi.edu>
Date: 26 Sep 1986 1200-PDT (Friday)
To: Kent M Pitman <KMP@SCRC-STONY-BROOK.ARPA>
Cc: CL-ERROR-HANDLING@SU-AI.ARPA, CL-VALIDATION@SU-AI.ARPA,
        dfm@JASPER.PALLADIAN.COM
Subject: Re: The straight story on ERRSET and related topics
In-Reply-To: Your message of Thu, 25 Sep 86 17:50 EDT.
             <860925175039.3.KMP@RIO-DE-JANEIRO.SCRC.Symbolics.COM>


This whole discussion of ERRSET has gotten way out of line.  I am not
proposing ANYTHING having to do with error control.  PERIOD.  I was simply
trying to find out if current implementations have enough error control
already, regardless of the form, for me to assume I could implement an
ERRSET-like function in them.  The conclusion was yes.  I needed just that
level of control for the test controller.

As a matter of fact, I would rather neither wait until the error stuff was
official, nor use it fully when it is in place because that is one area that
will be tested, and to rely heavily upon it in the test controller would be a
mistake.  The simpler the mechanism the better so far as my task is concerned.
This was not ever, nor is it now, really an issue for the error handling
people, or any of the technical discussion groups.  I am raising no questions
about the correctness or ERRSET or any particular method of handling errors.

As a matter of fact, it isn't a discussion issue.  It was just a survey of the
existing error control mechanisms amongst vendors.

OK?????

Best,

RB

∂30-Sep-86  1220	berman@vaxa.isi.edu 	test stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 30 Sep 86  12:20:36 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA15013; Tue, 30 Sep 86 12:22:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8609301922.AA15013@vaxa.isi.edu>
Date: 30 Sep 1986 1222-PDT (Tuesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test stuff

Here are some sample tests.  The first few are pretty simple, and the last one
is a test sequence.  I am preparing more tests which will show all the various
features.  Please, no flames about the content of these tests.

Best,

RB
---------------------------------------------
;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

(setq *contrib$* "CDC.  Test case written by Richard Hufford.")
(setq *doc$* nil)

(deftest
  (acons acons-1)
  (equal (acons 'frog 'amphibian nil) (frog . amphibian))
  :doc$ "ACONS to NIL")
  
(deftest
  (acons acons-2)
  (equal
    (acons 'frog
	   'amphibian
	   '((duck . bird)(goose . bird)(dog . mammal)))  
    (frog . amphibian)(duck . bird)(goose . bird)(dog . mammal))
  :doc$ "acons to a-list")

(deftest
  (acons acons-3)
  (equal (acons 'frog nil nil) ((frog)))
  :doc$ "acons nil datum")

(deftest
  (acons acons-4)
  (equal
    (acons 'frog
	   '(amphibian warts webbed-feet says-ribbet)
	   nil)  
    ((frog . (amphibian warts webbed-feet says-ribbet))))
  :doc "acons with list datum")


*********  Now some ACOS tests.

;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

 
(deftest-seq
  (acos cdc-acos-tests)
  (((acos-1)
    (ACOS-TEST-FN (ACOS  1.0000)  0.0000))
   ((acos-2)
    (ACOS-TEST-FN (ACOS  0.9659)  0.2619))
   ((acos-3)
    (ACOS-TEST-FN (ACOS  0.8660)  0.5237))
   ((acos-4)
    (ACOS-TEST-FN (ACOS  0.7071)  0.7854))
   ((acos-5)
    (ACOS-TEST-FN (ACOS  0.5000)  1.0472))
   ((acos-6)
    (ACOS-TEST-FN (ACOS  0.2588)  1.3091))
   ((acos-7)
    (ACOS-TEST-FN (ACOS  0.0000)  1.5708))
   ((acos-8)
    (ACOS-TEST-FN (ACOS  -0.2588)  1.8326))
   ((acos-9)
    (ACOS-TEST-FN (ACOS  -0.5000)  2.0944))
   ((acos-10)
    (ACOS-TEST-FN (ACOS  -0.7071)  2.3562))
   ((acos-11)
    (ACOS-TEST-FN (ACOS  -0.8660)  2.6180))
   ((acos-12)
    (ACOS-TEST-FN (ACOS  -0.9659)  2.8799))
   ((acos-13)
    (ACOS-TEST-FN (ACOS -1.0000)  3.1416)))
  :setup  (DEFUN ACOS-TEST-FN (ARG1 ARG2)
	    (PROG (RES) (COND ((= ARG1 ARG2) (RETURN T))
			      ((= ARG2 0.0) (RETURN (AND (> ARG1 -1E-9)
							 (< ARG1 1E-9))))
			      (T (SETQ RES (/ ARG1 ARG2))
				 (RETURN (AND (> RES 0.9999)
					      (< RES 1.0001)))))))
  :unsetup (fmakunbound 'acos-test-fn)
  :contrib$ "CDC.  Test case written by BRANDON CROSS, SOFTWARE ARCHITECTURE AND ENGINEERING"
  :doc$ nil)



∂30-Sep-86  2155	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test stuff 
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 30 Sep 86  21:55:15 PDT
Received: from GAYE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 4707; Wed 1-Oct-86 00:55:43 EDT
Date: Wed, 1 Oct 86 00:55 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test stuff
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8609301922.AA15013@vaxa.isi.edu>
Message-ID: <861001005539.1.CFRY@GAYE.AI.MIT.EDU>

Thanks for the examples. They tell me more than pages and pages of text.
They look good. I can imagine saving a little typing
by combining some of the features of both deftest and 
deftest-seq. deftest-seq appears to lack the ability to
take a predicate like deftest does.
Also, can't we make an automatic test-name generator
so that it assigns "acos-1" to the first test, "acos-2" to the next test?
Take the name of the seq and string-append  highest number + 1 made yet for
something of that name.
Also, set-up and unsetup could simple be the first few and last few forms
of a test sequence. Here's my simplified model:
(deftests 'acos
  (progn (defun test-acos () ...) t) ;the set up. We might want a form called RETURN-T
     ;which evaluates all of its args and returns T. or you could call the form SET-UP.
     ;Maybe such calls would not get an official ACOS-23 made up for them.
     ; but if they error, you do need a way to tell the user, so maybe you do want 
     ; to do (eq (defun foo ...) 'foo) and treat it as a regular test. That certainly
     ; simplifies implementation.
  (equal (acos ...) foo)
  (eql   (acos .....) bar)
  (test-acos ...) ;returns non-nil if works
  (= (acos ...) baz)
  (expect-error (acos -999999999999)) ;maybe there's a way to use ignore-error for this.
   ;the idea is, if the call to acos errors, then things are working as they should
   ; and expect-error will return T, else NIL.
  (progn (fmakunbound 'test-acos) t) ; unsetup
)

My deftests could be syntactic sugar written using your deftest.

I notice deftest-seq takes keyword args such that
the SETUP arg must appear textually AFTER the actual tests.
Not easy to comprehend. My deftests doesn't have this problem.

∂01-Oct-86  1036	berman@vaxa.isi.edu 	test stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 1 Oct 86  10:36:47 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA23054; Wed, 1 Oct 86 10:36:04 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610011736.AA23054@vaxa.isi.edu>
Date:  1 Oct 1986 1035-PDT (Wednesday)
To: CL-Validation@su-ai.arpa
Subject: test stuff


I got the following today:

------- Forwarded Message

Date: Tue, 30 Sep 86 17:43 EST
Sender: mike%gold-hill-acorn@mit-live-oak.arpa
To: berman@vaxa.isi.edu (Richard Berman)
From: mike%acorn@mit-live-oak.arpa
Subject: test stuff

    From: berman@vaxa.isi.edu (Richard Berman)
    Date: 30 Sep 1986 1222-PDT (Tuesday)
    
    Here are some sample tests.  The first few are pretty simple, and
    the last one is a test sequence.  I am preparing more tests which
    will show all the various features.  Please, no flames about the
    content of these tests.
    
    Best,
    
    RB
    ---------------------------------------------


Richard,
  In the next set can you please include a sample of testing
for a side-effect operator. I don't see yet how the software
can verify that the side effect did in fact occur. The test
program has to be called with both the argument to and the 
result of the tested operator.

Rplacd would be sufficient as an example.


Thanks,

Mike Beckerle
Gold Hill Computers.

P.S. We are attempting to base some in-house testing software
on your test system. This will hopefully result in some
test/validation suite contributions from GH.



------- End of Forwarded Message

One way to do this would be to use a simple EQUAL predicate on the rplacd, but
if you wanted to be sure that the subform was shared, you could use a test
sequence DEFTEST-SEQ where the first test did the rplacd and had an EQUAL
test, and the second test did EQ on the subform.  But I will send a test like
this.

RB

∂01-Oct-86  1052	berman@vaxa.isi.edu 	test stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 1 Oct 86  10:52:02 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA23288; Wed, 1 Oct 86 10:53:17 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610011753.AA23288@vaxa.isi.edu>
Date:  1 Oct 1986 1053-PDT (Wednesday)
To: CL-VALIDATION@su-ai.arpa
Cc: 
Subject: test stuff


>Date: Wed, 1 Oct 86 00:55 EDT
>From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
>Subject: test stuff
>To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA


>They look good. I can imagine saving a little typing
>by combining some of the features of both deftest and 
>deftest-seq. deftest-seq appears to lack the ability to
>take a predicate like deftest does.

Actually, deftest-seq works just like deftest.  Each test is of the same form
as deftest.

>Also, can't we make an automatic test-name generator
>so that it assigns "acos-1" to the first test, "acos-2" to the next test?

I thought of this, and then decided against it because it was better to be
able to positively identify tests when an error was reported. So....how about
this:  The DEFTEST-SEQ stuff can generate these names, as you wish, but will
only do so if the name is not present (i.e. that list is NIL, or just the type
with no name).  The files that are generated by the database look just like
the original tests, but DEFTEST is called RUNTEST and DEFTST-SEQ is called
RUNTEST-SEQ.  Thus, if you take my file that I sent and change the names as
above, you can get a close approximation of the way the file should look when
it comes out of the database.

So...when the output file is generated, it will have the names that were
assigned at DEFTEST time to the database.  Thus, when an error occurs you can
do a simple text search to find out what the actual test was that blew up.

>Also, set-up and unsetup could simple be the first few and last few forms
>of a test sequence. Here's my simplified model:

Yeah, but I want to be able to spot them as separate entities.  The test
manager knows how to report an error in the setup and unsetup, so that's not a
problem.  I know it is a bit ugly, and maybe some kind of keyword stuff in the
test-seq would work, but don't forget that setup and unsetup can also appear
in a regular singular test, so there is no sequence.  Also, using PROGN
totally defeats the test-reporting accuracy of the manager.


So I'll go ahead and make deftest-seq automate conditional name generation.

Best,

RB

∂02-Oct-86  1421	@MIT-LIVE-OAK.ARPA,@GOLD-HILL-ACORN.LCS.MIT.EDU:mike%gold-hill-acorn@MIT-LIVE-OAK.ARPA 	testing side effects  
Received: from LIVE-OAK.LCS.MIT.EDU by SAIL.STANFORD.EDU with TCP; 2 Oct 86  14:21:28 PDT
Received: from GOLD-HILL-ACORN.DialNet.Symbolics.COM (DIAL|DIAL|4925473) by MIT-LIVE-OAK.ARPA via DIAL with SMTP id 11985; 2 Oct 86 17:20:32-EDT
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 42125; Thu 2-Oct-86 16:05:17-EDT
Date: Thu, 2 Oct 86 16:06 EST
Sender: mike%gold-hill-acorn@mit-live-oak.arpa
To: berman@vaxa.isi.edu (Richard Berman)
From: mike%acorn@mit-live-oak.arpa
Subject: testing side effects
Cc: CL-Validation@su-ai.arpa

    From: berman@vaxa.isi.edu (Richard Berman)
    Date:  1 Oct 1986 1035-PDT (Wednesday)
    
    Richard,
      In the next set can you please include a sample of testing
    for a side-effect operator. I don't see yet how the software
    can verify that the side effect did in fact occur. The test
    program has to be called with both the argument to and the 
    result of the tested operator.
 
    For example, Rplacd.
    
    ---------------------

    One way to do this would be to use a simple EQUAL predicate on the rplacd, but
    if you wanted to be sure that the subform was shared, you could use a test
    sequence DEFTEST-SEQ where the first test did the rplacd and had an EQUAL
    test, and the second test did EQ on the subform.  But I will send a test like
    this.
    
    RB
    
No good. You HAVE to be able to test side effects easily, by
providing some predicate which is called on both the arguments to and
result of a destructive operator, This way the predicate can judge if
the side effect really took place and return t or nil.

From what I've seen thusfar of the test system, I'd say it seems
to be oriented very much toward testing of pure functional operators;
where the criteria for passing a test can be defined in terms of 
a table of input data and corresponding output data. This just
doesn't extend to destructive operations well. 

It ought to be about as easy to define a "deftest" for side effects as it
is to  write a program which notices a side effect. 
Test sequences ought to be for just that, performing a sequence of 
SEPARATE tests. A single test ought to be able to tell if a side
effect operator is working properly for given input data. 

Testing operators which are for effect is really totally different from
testing functional operators. I would expect to see somewhat different syntax
for creating the tests for them. 

Once again, show me a test set for rplacd, or even better
(setf (aref...)), and I'll be convinced.

...mike beckerle
Gold Hill Computers.    



∂02-Oct-86  1806	marick%mycroft@gswd-vms.ARPA 	testing side effects
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 2 Oct 86  18:06:24 PDT
Received: from mycroft.GSD (mycroft.ARPA) by gswd-vms.ARPA (5.51/)
	id AA09241; Thu, 2 Oct 86 20:06:35 CDT
Message-Id: <8610030106.AA09241@gswd-vms.ARPA>
Date: Thu, 2 Oct 86 20:06:33 CDT
From: marick%mycroft@gswd-vms.ARPA (Brian Marick)
To: cl-validation@su-ai.arpa, mike%gold-hill-acorn@live-oak.lcs.mit.edu
Subject: testing side effects


I disagree.  Testing a side effect by first checking the result of the
side-effecting form and then checking that the desired side effect
happened (and that no undesired effects happened) works quite reasonably
well in practice.  I see no reason to invent a new notation.


	(setq a '(a b))
	(= (setf (car a) 5) 5)
	(equal a '(5 b))

is not entirely elegant, but it's good enough.

Brian Marick
Gould Computer Systems -- Urbana

∂03-Oct-86  1102	berman@vaxa.isi.edu 	test stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 3 Oct 86  11:02:22 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA20804; Fri, 3 Oct 86 11:04:30 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610031804.AA20804@vaxa.isi.edu>
Date:  3 Oct 1986 1104-PDT (Friday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test stuff

>Date: Wed, 1 Oct 86 17:26:36 CDT
>From: marick%mycroft@gswd-vms.ARPA (Brian Marick)
>To: berman@vaxa.isi.edu, cl-validation@sail.arpa
>Subject: Test Stuff

>Given the uncertainty about what's in what package and who inherits from
>where, the (IN-PACKAGE "TESTS") should contain an explicit :USE "LISP".

OK.

>What's this "Base: 10" I see in the mode line?  -- The more general
>question is "How does the test know it's running in a clean
>environment?"  It's amazing the damage one person with a 
>(SETQ *PRINT-CASE* :CAPITALIZE) in their init file can do.

Well, we could go a few ways with this.  The simplest is just to define the
vanilla environment and to say that the testers at the test site must ensure
this environment.  Or we could write a preface to the test which sets up the
environment, or we could do all kind of testing and reporting for the
environment.  I prefer the simplest approach -- the first one.

>I dislike the :SETUP and :UNSETUP keywords also.  

Yeah, me too.  But the reason was that the vast majority of the tests, by
actual survey through a number of *huge* test suites, don't require any such
stuff, and so I didn't feel like forcing the typing of NIL for both of these.

>I like to write my tests hierarchically:...
>
>[example]

I did not understand why you had the teardown code after the first set of
tests??? Or is it that each of the sub-groups of tests had its own code?

One of the reasons for NOT allowing setup as you have shown is that there is
no guarantee (far from it, actually) that all the tests for a given item
require the same setup/unsetup.  And allowing you to use a PROGN type of thing
so that the procedurality (is that a real word?) is apparent destroys the
ability to determine what is test and what is not.  Would you be amenable to
having a PROGNish construct where the first and last forms might be setup and
unsetup, but which MUST say (:SETUP <list of expressions>) and (:UNSETUP <list
of expressions>)??  If so, where do you suggest it go?  Since each test may
require it's own setup, that would mean we have to wrap an extra set of parens
around the test, and those few which need setups can insert the (:setup) and
(:unsetup) clauses as needed around the test.  Or we can say that you may have
either one or three forms, with one meaning no setup/unsetup, and three having
an implicit placement for the setup/unsetup.

And that is where deftest-seq comes in.  It allows you to associate a
setup/unsetup with a group of tests, besides also preserving the order of the
tests.  It sounds like this is more like what you have in mind.  

I agree that it would be more readable to have setup/unsetup be in their
"proper" places.  But I still want them clearly identified as setup/unsetup.
And if you want n test forms in a test, I still think deftest-seq is more
appropriate, especially in light of your example.  DEFTEST exists like it does
because of the nature of the majority of the existing tests.  Unless there is
some way to not have to enter such things as setup/unsetup, it stands.  I am
for the "1 or 3" forms idea as a possible solution.  But I would like to hear
more before I make any changes.

>In a huge test suite, you need just as much structure as you do in a
>huge program.  I'd like to see the test format support structure.  I
>realize that you can impose that structure from without, using the
>database, but that doesn't help me.

It does if the database is available to you, too.  Which is one of the goals.
Exactly how doesn't it help you??? In writing tests???  Certainly not in the
running of tests, because that is all managed by the test controller.   What
did you have in mind?

Best,

RB

∂03-Oct-86  1112	berman@vaxa.isi.edu 	test stuff    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 3 Oct 86  11:12:36 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA20927; Fri, 3 Oct 86 11:14:49 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610031814.AA20927@vaxa.isi.edu>
Date:  3 Oct 1986 1114-PDT (Friday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test stuff

>Date: Thu, 2 Oct 86 16:06 EST
>Sender: mike%gold-hill-acorn@mit-live-oak.arpa
>To: berman@vaxa.isi.edu (Richard Berman)
>From: mike%acorn@mit-live-oak.arpa
>Subject: testing side effects

    
>No good. You HAVE to be able to test side effects easily, by
>providing some predicate which is called on both the arguments to and
>result of a destructive operator, This way the predicate can judge if
>the side effect really took place and return t or nil.

Perhaps, but in that case why not just write the predicate as you've indicated
and use it as the deftest predicate???  I don't see any problem with that.
But exactly how do you give the result of a destructive operator when you
cannot make something EQ the that result without first doing the operator.
And if you do the operator first and then pass the result plus the original
args to a predicate, it is clear the the original args might now be changed.

Or do you mean that I pass the original args and something which should be
EQUAL to the result of the destructive operator?

>It ought to be about as easy to define a "deftest" for side effects as it
>is to  write a program which notices a side effect. 

It is.  Just define the predicate.

>Test sequences ought to be for just that, performing a sequence of 
>SEPARATE tests. A single test ought to be able to tell if a side
>effect operator is working properly for given input data. 

Well, if you're gonna be picky, maybe I should rename DEFTEST-SEQ to
DEFTEST-GROUP, and define it as being for any set of tests where any of the
following is true:  (a) the order of the tests must be preserved, (b) side
effects must be preserved, (c) the tests share anything global a all, such as
a predicate.

>Testing operators which are for effect is really totally different from
>testing functional operators. I would expect to see somewhat different syntax
>for creating the tests for them. 

It is not as clear to me as it is to you.  Why?  And what would you suggest?

Thanks for the ideas.  Please respond soon 'cuz I would like the answers.  It
could be real important.

RB

∂09-Oct-86  1449	@MIT-LIVE-OAK.ARPA:mike@GOLD-HILL-ACORN.LCS.MIT.EDU 	test stuff  
Received: from LIVE-OAK.LCS.MIT.EDU by SAIL.STANFORD.EDU with TCP; 9 Oct 86  14:49:22 PDT
Received: from GOLD-HILL-ACORN.DialNet.Symbolics.COM (DIAL|DIAL|4925473) by MIT-LIVE-OAK.ARPA via DIAL with SMTP id 12687; 9 Oct 86 17:16:59-EDT
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 42864; Wed 8-Oct-86 21:36:05-EDT
Date: Wed, 8 Oct 86 21:36 est
Sender: mike@acorn
To: berman@vaxa.isi.edu (Richard Berman)
From: mike%acorn@mit-live-oak.arpa
Subject: test stuff
Cc: cl-validation@su-ai.arpa, 
    nv%acorn@mit-live-oak.arpa, marcia%acorn@mit-live-oak.arpa

    From: berman@vaxa.isi.edu (Richard Berman)
    Date:  3 Oct 1986 1114-PDT (Friday)
    
    >Date: Thu, 2 Oct 86 16:06 EST
    >Sender: mike%gold-hill-acorn@mit-live-oak.arpa
    >To: berman@vaxa.isi.edu (Richard Berman)
    >From: mike%acorn@mit-live-oak.arpa
    >Subject: testing side effects
    
        
    >No good. You HAVE to be able to test side effects easily, by
    >providing some predicate which is called on both the arguments to and
    >result of a destructive operator, This way the predicate can judge if
    >the side effect really took place and return t or nil.

In fact, it's even worse than I mentioned here. You have to have
history-sensitive predicates which can observe non-trivial characteristics
of the arguments before the side-effects occur, and then 
call the side-effecting operator being tested, and then call another 
test which can decide based on the earlier-found characteristics
that the appropriate changes actually were made. In other words
the predicate-mechanism must be history sensitive. 

Lots more below on this.
    
     Perhaps, but in that case why not just write the predicate as you've
     indicated and use it as the deftest predicate???  I don't see any
     problem with that.  But exactly how do you give the result of a
     destructive operator when you cannot make something EQ the that
     result without first doing the operator.  And if you do the operator
     first and then pass the result plus the original args to a predicate,
     it is clear the the original args might now be changed.  Or do you
     mean that I pass the original args and something which should be
     EQUAL to the result of the destructive operator?

Ok, here is my fairly long winded reply which I believe will clarify
the problem I have with the current test system.  My only beef is
with the ways to test side effecting operators.

To make something that is EQ to the result, you must have more
control over what is going on than the current test system seems to
want to give. You need to be able to have a testing function which
keeps the appropriate EQ pointers around and checks to insure that
they remain appropriately EQ, etc. You need a separate test function
for each side effect.

The only environment in which side effects are detectable are ones where
you can observe that the same (eq) object changes over time. EQUALness
has nothing to do with it.

Let me illustrate this point by example.  The following program is a
fairly minimal one for checking if the function rplacd does what it
is supposed to. If rplacd replaced the cdr, and eq, car, cdr, and,
and let* all work, then this returns 't, otherwise nil.

(defun is-rplacd? ()
   (let* ((x (cons 'a 'b))
          (result (rplacd x 'c)))
       (and (eq x result)  ;; is the result eq to the argument
            (eq (cdr result) 'c)     ;; cdr has new contents
            (eq (car result) 'a))))  ;; car has same contents


There are three points here:

1) What you pass in the way of data is fairly irrelevant. Unlike
arithmetic pure-functions, it is fairly pointless to pass anything to 
a test program like this. it either returns t or nil, and 
you may want to print out some info which says that it failed or
succeeded. Note particularly that you have to check what DID NOT
change. 

2) You really can't determine if rplacd works or not unless you have
enough testing CONTEXT to know that EQness is preserved.

3) I am a lisp programmer, and I know how to see if rplacd works.
Why should I have to learn how to twist some other test 
mechanism in order to achieve something that I already know how to
do. In reality, the only thing I want the test system to do is to
keep a database which includes this test, runs it at the appropriate
time, and prints out whether it succeeds or fails for me. 

What I'd like to write to make a test suite for rplacd is something
like this. This is not intended as a concrete proposal for a new
deftest, but rather as an example of what the necessary functionality
really is.

(deftest rplacd-tester
  :form-to-test rplacd
  :contrib$ "me, myself" 
  :test #'(lambda ()
            (let* ((x (cons 'a 'b))
                   (result (rplacd x 'c)))
                (and (eq x result)  
                     (eq (cdr result) 'c)   
                     (eq (car result) 'a))))
  )


I'd expect to see only a slight variation on this to test
(setf (car ...) ..)

(deftest setf-car-tester
  :form-to-test (setf (car *) *)  ;; this syntax is completely non-issue
  :contrib$ "me, myself" 
  :test #'(lambda ()
            (let* ((x (cons 'a 'b))
                   (result (setf (car x 'c)))
                (and (eq x result)  
                     (eq (cdr result) 'c)   
                     (eq (car result) 'a))))
  )


Let me now play devil's advocate. Suppose we decide that this is fine
for rplacd but we want to generalize so that less individual coding
is done for each destructive operator.  So I'm going to attempt to
generalize this notion to give us a general paradigm for testing side
effects. What is going to happen is that we're going to get something
basically unusable due to complexity.

To generalize the above kind of side effect testing so that it will
work for any destructive operator we will need a function which is
passed the test arguments BEFORE the destructive operation is
evaluated, which can sample these arguments' substructures and keep
around EQness information, until after the operator is called. For
example, here is a generalized test form for side effects.

(defun test-side-effect (operator history-gatherer decider args)
  (let* ((history (funcall history-gatherer args))
         (results (multiple-value-list (apply function args)))
    ;; now to see if it worked
      (funcall decider 
               args 
               results)))

Now for each side-effecting operator, we need a
history-gathering-function, and a deciding-function.  A
history-gathering-function is one which given the args to be passed to
the operator grabs the relevant state so you can see it
change later.  A deciding-function is one which given the history,
the arguments, and the results, determines if the operator really
worked.

(defun rplacd-history-gathering-func (arglist)
    (let ((the-cons-cell (car arglist))
          (the-rplacement-item (cadr arglist)))
      ;; need to keep the car and cdr around, so make a list of them.
      (list (car the-cons-cell) (cdr the-cons-cell))) 

(defun rplacd-decider-function (history args results)
   (let ((result (car results))) ; only one result
    (let ((previous-car (car history))  ;; access history values
          (previous-cdr (cadr history))
          (new-car (car result))        ;; to compare against new ones
          (new-cdr (cdr result)))
      (and (eq (car args) result) ;; same cell as before
           (eq previous-car new-car)     ;; car is same
           (eq (cadr args) new-cdr)      ;; cdr is new
           (not (eq previous-cdr new-cdr)) ;; no wierd side effects faking us
      ))))


Now, to finally test rplacd, you have to do

(test-side-effect 'rplacd 
                  'rplacd-history-gathering-func
                  'rplacd-decider 
                  '(a . b)
                  'c)

As I mentioned above, I think this is totally gross, way too complex,
and not reasonable. In other words, I don't think you can really do
all the parameterization that is needed easily enough.  I can't think
of anything very much simpler than this which captures all the
necessary parameterization.  Testers should write the simplest code
necessary to see if the side effects occur, like my deftest examples
above, and not worry about everything being unnessarily parameterized
and table driven when it isn't relevant.
 
    >It ought to be about as easy to define a "deftest" for side effects as it
    >is to  write a program which notices a side effect. 
    
    It is.  Just define the predicate.

I think I've made my point. The predicate must have history sensitivity
to the change in the arguments. It is tough to express this using
the things I've seen in the testing code thusfar.

    ......

    >Testing operators which are for effect is really totally different from
    >testing functional operators. I would expect to see somewhat different syntax
    >for creating the tests for them. 
    
    It is not as clear to me as it is to you.  Why?  And what would you suggest?
    
    Thanks for the ideas.  Please respond soon 'cuz I would like the answers.  It
    could be real important.
    
    RB
    
Sorry that I couldn't respond about this quicker. My machine had
a headcrash.

...mike beckerle
Gold Hill Computers    



∂09-Oct-86  1509	marick%mycroft@gswd-vms.ARPA 	test stuff
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 9 Oct 86  15:09:09 PDT
Received: from mycroft.GSD (mycroft.ARPA) by gswd-vms.ARPA (5.51/)
	id AA01622; Thu, 9 Oct 86 13:10:51 CDT
Message-Id: <8610091810.AA01622@gswd-vms.ARPA>
Date: Thu, 9 Oct 86 13:10:48 CDT
From: marick%mycroft@gswd-vms.ARPA (Brian Marick)
To: cl-validation@su-ai.arpa
Subject: test stuff



It seems to me that test suites, especially large test suites, can be as
complex as other programs/systems -- and in many of the same ways.
Consequently, the same tools and habits used to build programs are
probably useful in building and maintaining tests.

(Analogy:  Editors with a powerful underlying extension language seem to
win over those without.)

Your test suite is slanted toward declarative tests where you just
assert that two values compare in some given way.  That, I agree, works
an awful lot of the time.  But not always.  When it doesn't -- when I
really need to write procedural code -- I don't see any reason not to
write setup code as ordinary Lisp code in the obvious place.

You mention tracing an error back to a test as a reason.  I don't see
how that's made impossible or even difficult.


I'd gotten the impression that the database manager would not be
available, only some least-common-denominator version of the data
(tests).  Which will be the case?  I'm just nervous that the test suite
will be delivered to me as a single directory of 10,000 small files.
Such a thing would be worthless to me in day-to-day work.  To use the ISI
test suite as part of development, I must be able to see the structure
of the test suite, whether that structure is built into the tests or
imposed externally.

I realize that my day-to-day work is not part of your charter.
I see my suggestions as optimizations.


bem.

∂14-Oct-86  1055	marick%cthulhu@gswd-vms.ARPA 	test stuff
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 14 Oct 86  10:55:07 PDT
Received: from cthulhu.GSD (cthulhu.ARPA) by gswd-vms.ARPA (5.51/)
	id AA03818; Tue, 14 Oct 86 12:55:03 CDT
Message-Id: <8610141755.AA03818@gswd-vms.ARPA>
Date: Tue, 14 Oct 86 12:54:56 CDT
From: marick%cthulhu@gswd-vms.ARPA (Marick)
To: cl-validation@su-ai.arpa
Subject: test stuff


Yes, you will have to have syntax to distinguish sequences of non-test
forms vs. sequences of test cases.  I don't think that's a big deal.
Pick

   (def-test ... (:NON-TEST form1 form2 form3) test1 test2 test3 ...)
or
   (def-test ... non-test1 non-test2 non-test3 (:TEST-CASES form1 form2 form3))
or even
   (def-test ... (:NON-TEST form1 form2 form3) (:TEST-CASES form1 form2 form3))

The problem I have with a sequence of tests with optional :SETUP and
:UNSETUP forms at the beginning and end is only that they must be at the
beginning and end.  It might be -- in fact, to me it is -- useful to
sometimes write code in the middle of a test.  Yes, you obviously can
break such a test up into several tests with the same effect.  That's
why I'm not really all that worked up -- we're talking convenience.
Before we worry about my convenience, we should find out if anyone else
cares.


I'm guessing that errors during the non-test code will be rare.  If so,
it would be OK just to fail the whole test.  (Any such failure should
also be caught by another part of the test suite, presumably.  For
example, if my setup code for my test of displacement of adjustable
arrays fails when making an adjustable array, the make-array tests ought
to fail also.)

bem.



∂14-Oct-86  1236	berman@vaxa.isi.edu 	Re: test stuff
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 14 Oct 86  12:36:29 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA03948; Tue, 14 Oct 86 12:35:45 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610141935.AA03948@vaxa.isi.edu>
Date: 14 Oct 1986 1235-PDT (Tuesday)
To: marick%cthulhu@gswd-vms.ARPA (Marick)
Cc: cl-validation@su-ai.arpa
Subject: Re: test stuff
In-Reply-To: Your message of Tue, 14 Oct 86 12:54:56 CDT.
             <8610141755.AA03818@gswd-vms.ARPA>


I like the :NON-TEST clause much better than :SETUP/:UNSETUP, 'cuz you can put
it anywhere.  I still don't want to put multiple tests under DEF-TEST.
Unless...we get rid of DEFTEST-SEQ and say all tests are sequences.  The
problem is that when reporting an error it is nice to say which test failed.
I could show the form, but for some tests that would not work too well because
the form is very complex.  That is also the reason why I am not real hip on
the automatically generated test name stuff for sequences.  When I say "test
foo-15 failed" and there is no "foo-15" in the test file you will have to
manually dig out the 15th foo test, which isn't always pretty.

Also, since each test can be an :eval, :no-eval or :error test, you still
don't get PROGN simplicity.  You have to wrap the test type around each test
of a sequence.  So maybe it would look like this:

(DEFTEST Foo
  (:non-test (defun foo-tester (x y) ...)
             (setq *foo-var* nil))
  (:eval <testform> <testform> <testform>)
  (:no-eval <testform> <testform>)
  (:non-test <other form> <other form>)
  (:no-eval <testform> <testform> <testform>)
  (:non-test (unfbind foo-tester) (unbind *foo-var*)))

I don't recall the actual unbind functions, but you get the idea.  Also there
would be other keyword stuff, like contributor, etc.  This could replace both
deftest and deftest-seq, but it would be difficult to exactly identify which
form failed in suych a way as to report that usefully to the person
interpreting the test results.

If I let the macro generate the test names, and the tests are all over the
place like they are here, telling you "Foo-11 failed" means you have to
manually count through this junk to the 11th test.  And if I also print the
offending expression, it could be a huge ugly thing which is as much trouble
as help.

That's why I wanted one test per deftest.

The deftest-seq thing CAN replace deftest, but I need a more explicit means of
identifying offending tests.

Any ideas?

As well since we are gonna have potentially a lot of "non-test" stuff, I
should also be able to identify the offender if one of these expressions cause
an error.

I am willing to adopt something like this if: a) other people express an
interest (are you listening???), and b) there is a good solution for the
identification problem.

I'll see what I can come up with.

Thanks,

RB

∂14-Oct-86  2136	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: test stuff  
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 14 Oct 86  21:34:30 PDT
Received: from KAREN.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 6789; Wed 15-Oct-86 00:33:16 EDT
Date: Wed, 15 Oct 86 00:33 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: test stuff
To: berman@vaxa.isi.edu, marick%cthulhu@GSWD-VMS.ARPA
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8610141935.AA03948@vaxa.isi.edu>
Message-ID: <861015003339.2.CFRY@KAREN.AI.MIT.EDU>

Your concern about printing out error messages with automatically generated
test names is valid. My suggestion of auto names was not a requirement,
only an option for the user to trade unspecificity for convienience.
Of course I expect the ISI test suite to have full names for each and every test.
My real concern here is that we'll have a nice test package, but it will
be too combersome to use for random coding because the start-up costs will be
too high. Thus people will continue to just type their tests into the listener
or in comments. Over the lifetime of the code that's being tested,
the hacker will write the same tests over and over, not realizing the advantage of
doing it THE RIGHT WAY in the first place. And if the right way has too much overhead,
it will be used very little indeed. If we permit things like test-name to default,
then a hacker can START using the test mechanism very easily.
The first time he gets an error message of the form test FOO-137 failed and has difficulty
finding that test, then, when he does find it, he can just stick in a nice semantically meaningful
name.

[If the mechanism really is used, someone will write a meta-. which heristically finds test foo-157
though.]

Having one test per NAME vs multi-tests per name is simply another trade-off
of lots of typing vs convienience. For official test suites, use 1 test per form.
For hacking around, allow the user to have many tests per form.

------
Printing an auto generated name combined with the form that errored may indeed
give you too much detail. So what you really want is a SUMMARY of the
errors [printed first] and then [maybe only by user request] print the gory details.

If what you're after is a really concise summary, auto names might be great:
Example: Failed tests: foo 2, 4,5,6,7,8,9 73  bar 1,2,3
An AI parser could print it as "foo 2, 4->9, 73, bar ALL."
------
I'd like to have a simple program implement testing.
But I'm willing to trade off a little hair in the testing program
to save some work for the diagnostic writer. I'd guess the
tests for CL Lists will contain about 5 times  more characters
than a medium hair testing program.

∂15-Oct-86  1103	berman@vaxa.isi.edu 	test names    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 15 Oct 86  11:03:00 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02010; Wed, 15 Oct 86 11:05:55 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610151805.AA02010@vaxa.isi.edu>
Date: 15 Oct 1986 1105-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test names



>Your concern about printing out error messages with automatically generated
>test names is valid. My suggestion of auto names was not a requirement,
>only an option for the user to trade unspecificity for convienience.
>Of course I expect the ISI test suite to have full names for each and every
>test.

My final idea was to allow automatically generated test names, but that these
names are generated by the database here at ISI when the tests are entered,
and when the tests come out of the database into a file, these names are
present and thus when the user sees this file it has names. 

>My real concern here is that we'll have a nice test package, but it will
>be too combersome to use for random coding because the start-up costs will be
>too high. Thus people will continue to just type their tests into the listener
>or in comments.

  Yeah, so, for hacking I can also allow test names to be generated at run
time, but these names will have a special form, and they would probably get
their evalforms printed if an error occured so the guy could find them in his
file.  Not I say *his* file 'cuz clearly this would be a non-ISI file.


> Having one test per NAME vs multi-tests per name is simply another trade-off
> of lots of typing vs convienience. For official test suites, use 1 test per
>form.
>For hacking around, allow the user to have many tests per form.

I'm not so sure.  The hacker would probably write his own macro which calls my
run-time macro (which looks identical to the define macro, but is called
RUNTEST and RUNTEST-SEQ), and so save himself some trouble.  And how many guys
can come up with a whole slew of tests of the top of their heads?  I still
think N=1 unless you want a test sequence.


>If what you're after is a really concise summary, auto names might be great:
>Example: Failed tests: foo 2, 4,5,6,7,8,9 73  bar 1,2,3
>An AI parser could print it as "foo 2, 4->9, 73, bar ALL."

Interesting thought, especially when lots of errors are expected.  I suspect
verbosity will have several levels of control.

Thanks.  Any more ideas/comments out there???

RB

∂15-Oct-86  1224	berman@vaxa.isi.edu 	testing side effects    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 15 Oct 86  12:23:24 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02964; Wed, 15 Oct 86 12:26:08 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610151926.AA02964@vaxa.isi.edu>
Date: 15 Oct 1986 1226-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc: mike%acorn@mit-live-oak.arpa
Subject: testing side effects


Thanks for that well thought out reply.  Here is an actual working example of
using the current testing stuff with your rplacd example:


(deftest
  (rplacd rplacd-1)
  (rplacd-test (cons 'a 'b) 'c)
  :setup (defun rplacd-test (x y)
	   (let* ((carx (car x))
		  (result (rplacd x y)))
	     (and (eq x result)
		  (eq (cdr result) y)
		  (eq (car result) carx))))
  :unsetup (fmakunbound rplacd-test)
  :contrib$ "Richard Berman @ ISI"
  :doc$ "Example side-effect testing predicate")

Now, for the sake of useability I would suggest that deftest-seq be used so
that rplacd-test could be tested with a number of tests without having to
duplicate it all over the place.  By the way, this is what I meant by "just
write the predicate as you've indicated and use it as the deftest predicate".

You have as much control as you like, and inside the predicate is a good place
for all kinds of complex stuff.  

Just remember that if the predicate bombs out, it is that particular test
which will report the failure.  So if the predicate is written wrong, test
results may be meaningless.  Not exactly the surprise statement of the decade.

Note that this is not too dissimilar from the "What I'd like to write..." form
in your message.  In fact, it could be even more similar.  The test form could
have been (EQ T <your form>) where <your form> is the lambda expression you
provided in your example.  This would eliminate the need for the :SETUP and
:UNSETUP clauses.

Best,

RB

∂22-Oct-86  1052	berman@vaxa.isi.edu 	unnamed tests 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 22 Oct 86  10:52:28 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA03172; Wed, 22 Oct 86 10:52:04 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8610221752.AA03172@vaxa.isi.edu>
Date: 22 Oct 1986 1051-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: unnamed tests


Ok, so here's what I've done:

DEFTEST can accept a test without a name, in which case it uses the "item"
(which is the common lisp symbol of which this is a test, for example + or
READ, etc.) to construct a name which then ends up in the database.
DEFTEST-SEQ uses DEFTEST, so you don't have to supply test names there either.

When the database is queried, it responds by spewing out a bunch of tests to a
stream, formatted almost identically to DEFTEST and DEFTEST-SEQ, but with the
names RUNTEST and RUNTEST-SEQ.  The supplied/created test names appear here.
But you can directly enter your own RUNTESTs without names (as above).  At
runtime names are NOT created because it is useless.  The reference to a
runtime-created name doesn't help.  So if any error occurs in such a test, the
test manager reports something like "Unamed test of <item> caused an error..."
and says at what point (like when evaluating the evalform, or the compareform,
etc.) and then it displays the offending expression.  This is the only
definitive way I came up with that could help locate the actual test.

Flames?  Ideas?  Suggestions?  Huge stiffled yawns?

RB

∂02-Mar-87  1332	berman@vaxa.isi.edu 	Who's Where?  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 2 Mar 87  13:32:12 PST
Received: by vaxa.isi.edu (4.12/4.7)
	id AA05893; Mon, 2 Mar 87 13:32:49 pst
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8703022132.AA05893@vaxa.isi.edu>
Date:  2 Mar 1987 1332-PST (Monday)
To: CL-VALIDATION@su-ai.arpa
Cc: 
Subject: Who's Where?


Hi all.  I seem to be having some trouble with net stuff, so could all of the
X3J13 validation members puh-lease send me your net addresses?  And actually
put it in the message rather than relying on automatic generation by the
mailer.

(Especially David Slater!)

Thanks.

RB

∂03-Mar-87  0820	@MIT-LIVE-OAK.ARPA,@GOLD-HILL-ACORN.LCS.MIT.EDU:mike%acorn@MIT-LIVE-OAK.ARPA 	whos where? 
Received: from LIVE-OAK.LCS.MIT.EDU by SAIL.STANFORD.EDU with TCP; 3 Mar 87  08:20:28 PST
Received: from GOLD-HILL-ACORN.DialNet.Symbolics.COM by MIT-LIVE-OAK.ARPA via DIAL with SMTP id 30931; 3 Mar 87 11:20:36-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 56866; Mon 2-Mar-87 20:01:49-EST
Date: Mon, 2 Mar 87 20:04 est
From: mike%acorn@oak.lcs.mit.edu
To: berman@vaxa.isi.edu (Richard Berman)
Reply-to: mike%acorn@oak.lcs.mit.edu
Subject: whos where?
Cc: CL-VALIDATION@su-ai.arpa

    From: berman@vaxa.isi.edu (Richard Berman)
    Date:  2 Mar 1987 1332-PST (Monday)
    
    
    Hi all.  I seem to be having some trouble with net stuff, so could all of the
    X3J13 validation members puh-lease send me your net addresses?  And actually
    put it in the message rather than relying on automatic generation by the
    mailer.
    
    (Especially David Slater!)
    
    Thanks.
    
    RB

I am right where I am:    
   mike%acorn@oak.lcs.mit.edu

...mike beckerle
Gold Hill Computers
    



∂03-Mar-87  1249	berman@vaxa.isi.edu 	Re: Who's Where?   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 3 Mar 87  12:49:08 PST
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17113; Tue, 3 Mar 87 12:49:32 pst
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8703032049.AA17113@vaxa.isi.edu>
Date:  3 Mar 1987 1249-PST (Tuesday)
To: berman@vaxa.isi.edu (Richard Berman)
Cc: CL-VALIDATION@su-ai.arpa, X3J13@sail.standford.edu
Subject: Re: Who's Where?
In-Reply-To: My message of 2 Mar 1987 1332-PST (Monday).
             <8703022132.AA05893@vaxa.isi.edu>


A few days ago I asked for the members of the X3J13 subgroup on validation to
please send me their current net address as part of a message (as opposed to
just having the mailer automatically generate one).  So far Mike Beckerle and
John Foderaro have responded.  Could Jon L. White and David Slater please
respond, as well as any others???

Thanks a bunch.

RB

∂03-Mar-87  1359	berman@vaxa.isi.edu 	Validation    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 3 Mar 87  13:59:30 PST
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17899; Tue, 3 Mar 87 13:59:42 pst
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8703032159.AA17899@vaxa.isi.edu>
Date:  3 Mar 1987 1359-PST (Tuesday)
To: Marick@GSWD-VMS.ARPA
Cc: berman@vaxa.isi.edu, cl-validation@su-ai.arpa, X3J13@SU-AI.arpa
Subject: Validation


> I'm not going to respond to your message until you respond to mine.

> I just spent a weekend rewriting sequence function tests and,
> consequently, rewriting some sequence functions.  I am more convinced
> than ever that a test suite should predate publication of a standard
> -- there's at least one other sequence function besides *ASSOC* that's
> underspecified.  (Unfortunately, I didn't write it down, and I've
> forgotten which one, now.)

> Does the deafening silence mean consent?

Brian, let me know if this gets through.  I have been responding to messages,
but sending them to marrick%cthulhu@gswd-vms.arpa has not worked.

I don't know that a test suite should *predate* standard publication, but it
should be part of that publication.  In fact, when formally published it would
be a good idea to for the standard and the test-suite to cross reference each
other.  That is, the each test should test a specific clause (or clauses?) of
the standard.  The standard should also include forms which must evaluate in a
certain way where this is meaningful to the definition of some part of the
standard.  In these cases the form would be part of the test suite.

HOWEVER....

I believe it is possible to have a useful test suite before the standard is
finalized and published.  I am preparing a report now for the X3 meeting that
will outline some possible stages in the development of the test suite.  Part
of our problem is that we are aiming at a moving target...

Best,

RB

∂07-May-87  1302	berman@vaxa.isi.edu 	Re: test suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 7 May 87  13:02:10 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA10111; Thu, 7 May 87 13:01:59 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8705072001.AA10111@vaxa.isi.edu>
Date:  7 May 1987 1301-PDT (Thursday)
To: cl-validation@sail.stanford.edu
Subject: Re: test suite


This is in response to David's comments about having gotten the testing stuff
from venera.isi.edu, only to find it riddled with non-CL-ness and other
stupidities.  


David,

Sorry, but that whole first version was there specifically to just get
*something* out.  It was not really ready and I thought that was made clear.
Since then it has been updated a lot, and I think another interim version got
to the FTP directory.  I have a version which I am currently testing with
about 3300 tests.  Both errors in the tests and in the test controller are
being debugged.  The test controller has been stable for about a week now, but
there are still tests that were incorrectly converted from the original.

As for being "Common Lisp", the current test code should be in CL.  It was
tested by loading on three different implementations (two of which, it turned
out, allowed all of those non-CL type things you found.  It wasn't until the
third and final test that most of these things were routed out).

This stuff is usually noted on the CL-VALIDATION mailling list, which you can
get on the same way you got onto Common-Lisp.

When this stuff is ready (another week or so) the whole thing will be made
available and the updated documentation will also be ready.

I am *really* sorry about the hassle, but that was one of the purposes and
risks of taking that software.  I hope the next release will be more along the
lines of what you need.

Caveat:  The development paradigm for the tests is a little wierd.  It goes
like this.  

1.  I have 20,000 tests sitting in files in various formats from various
vendors.

2.  These need to be made available, either by converting them or writing some
sort of interface to my stuff.

3.  Usually they are convertible, starting with automation and finishing with
a lot of cleanup by hand.

4.  This process leaves behinds tests which may not actually test CL stuff, or
may test it incorrectly.

5.  As *most* of these tests are ok, they are released any way.

6.  You get these tests.

7.  You run them.

8.  Some of them may actually bomb (i.e. totally crash).  In such cases, even
if the test is incorrect for some reason, there is also a bug in the
implementation.

9.  If you disagree with some test on any basis, don't get into a sweat.  Just
let me know which test and why.  If it is clearly not a test of CL, I'll
remove/repair it.  If it is ambiguous, it will be left in with a note to the
effect, and you may ignore its results until higher authorities can figure out
if it should be left in, and why.

So the users of the test suite are EXPECTED to participate in the development
of the test suite.  At a near later date, when a very thorough test suite is
released, another facet comes into play.  That is, where the test suite does
not report some error that another method locates.  I expect this other method
will probably be some application program bombing.  Users should definitely
report the exact area of incorrect implementation that was missed by the test
suite.  Preferably, this would include some code (in the test format, please!)
that would detect the condition.

And Thus It Will Come To Pass That The Test Suite Shall Approach Total
Completeness

I hope this helps.  

RB




∂20-May-87  1226	berman@vaxa.isi.edu 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 20 May 87  12:26:15 PDT
Posted-Date: Wed, 20 May 87 12:24:36 PDT
Message-Id: <8705201924.AA03002@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA03002; Wed, 20 May 87 12:24:40 PDT
To: las@bfly-vax.bbn.com
Cc: berman@vaxa.isi.edu, cl-validation@sail.stanford.edu
In-Reply-To: Your message of Wed, 20 May 87 15:11:55 -0400.
             <8705201913.AA02866@vaxa.isi.edu> 
Date: Wed, 20 May 87 12:24:36 PDT
From: Richard Berman <berman@vaxa.isi.edu>

Subscribe to CL-VALIDATION (via RPG).  See if you can get its "archives",
which would be very short.  

It has all the info, but, briefly:

There IS an FTPable version, but it is not 100% common lisp, and my working
version is very much changed, bug-fixed, etc.  I am in the process of using a
3300 test suite (converted from part of HP's test suite) to develop the test
suite code.  It is an evolutionary process on 3 fronts:

1.  The test suite code itself is being bug-fixed by running these tests.

2.  Bugs in the suite of tests are being found (i.e. tests that don't really
test CL feaures correctly)

3.  Non-CLness in the test-suite (as opposed to outright bugs) is being found
and eradicated, as well as some enhanced features.

When the 3300 tests all run with no errors in the test suite code (or, at
least they report errors that really are in the implementation being used as a
testbed) I will release the 3300 tests.  Probably before that I will be
confident enough that the test suite manager is correct to release the manager
and a subset of the tests.

RB

∂21-May-87  1307	berman@vaxa.isi.edu 	Re: :fill-pointer maybe 
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 21 May 87  13:07:48 PDT
Posted-Date: Thu, 21 May 87 13:07:00 PDT
Message-Id: <8705212007.AA11632@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA11632; Thu, 21 May 87 13:07:03 PDT
To: "Scott E. Fahlman" <Fahlman@c.cs.cmu.edu>
Cc: Richard Berman <berman@vaxa.isi.edu>, cl-validation@sail.stanford.edu
Subject: Re: :fill-pointer maybe 
In-Reply-To: Your message of Thu, 21 May 87 15:20:00 -0400.
             <FAHLMAN.12304201957.BABYL@C.CS.CMU.EDU> 
Date: Thu, 21 May 87 13:07:00 PDT
From: Richard Berman <berman@vaxa.isi.edu>


Scott, I still have a problem with the sheer amount of "unpecified" stuff that
therefore "is an error".

For example, I am trying to get out some tests on arrays.  I find that nearly
everything that is trying to create an error condition really creates
is-an-error conditions.  Like wrong number of subscripts to AREF.  Like
negative or fractional subscripts.  Like subscripts out of range.  Problem is,
I actually have an implementation that is returning stuff for out-of-range
subscript.  I can't tell if it's a bug or not!

If it is definite that all this stuff really "is an error", I will ahead and
either remove the tests or change them to my new :IS-ERROR test type, but not
with a big smile.

RB

∂25-Jun-87  0711	las@bfly-vax.bbn.com 	Just a test...    
Received: from BFLY-VAX.BBN.COM by SAIL.STANFORD.EDU with TCP; 25 Jun 87  07:11:24 PDT
To: cl-validation@sail.stanford.edu
Subject: Just a test...
Date: 25 Jun 87 10:04:34 EDT (Thu)
From: las@bfly-vax.bbn.com

Just a test.

∂29-Jun-87  1255	berman@vaxa.isi.edu 	new suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 29 Jun 87  12:55:11 PDT
Posted-Date: Mon, 29 Jun 87 12:54:08 PDT
Message-Id: <8706291954.AA01008@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA01008; Mon, 29 Jun 87 12:54:15 PDT
To: CL-Validation@sail.stanford.edu
Cc: LAS@bfly-vax.bbn.com
Subject: new suite
Date: Mon, 29 Jun 87 12:54:08 PDT
From: Richard Berman <berman@vaxa.isi.edu>


I have heard this message didn't get around for some reason.  The address is
old, but I received no indication from the mailer that it didn't work, so...

------- Forwarded Message

Return-Path: berman@vaxa.isi.edu
Posted-Date: Thu, 18 Jun 87 12:54:29 PDT
Received-Date: Thu, 18 Jun 87 12:54:42 PDT
Message-Id: <8706181954.AA01441@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA01441; Thu, 18 Jun 87 12:54:42 PDT
To: CL-VALIDATION@su.ai.arpa
Cc: berman@vaxa.isi.edu
Subject: new stuff
Date: Thu, 18 Jun 87 12:54:29 PDT
From: Richard Berman <berman@vaxa.isi.edu>

The long awaited bigger test suite and better code is now available!


In the Valid-Code directory on ISI.EDU (used to be called VENERA.ISI.EDU) is a
file called README.NOW.  Get this.  You will also need a new APP-B (and all
the other stuff if you haven't gotten it yet).  At the time of this posting,
the actual test source files (that is, files containing test definitions)
haven't beean posted, but soon they will be in VALID-TESTS.  For now, the main
change is in the source code in Valid-Code, which you should get.  It ought to
be much more compatible, more bugfree and with additional features.

In Valid-Tests, in addition to all the old stuff, you will find TESTS1.LISP,
which is a Run-test file containing over 3000 new tests.

There is much more in the text files.  

Best,

RB



------- End of Forwarded Message

∂30-Jun-87  1402	berman@vaxa.isi.edu 	More Tests    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 30 Jun 87  14:02:12 PDT
Posted-Date: Tue, 30 Jun 87 14:01:08 PDT
Message-Id: <8706302101.AA25655@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA25655; Tue, 30 Jun 87 14:01:10 PDT
To: CL-VALIDATION@sail.stanford.edu
Subject: More Tests
Date: Tue, 30 Jun 87 14:01:08 PDT
From: Richard Berman <berman@vaxa.isi.edu>


Hello all.  I currently have over 400 more tests, and more coming, ready to
go.  Before I will make another release, I need feedback.  So...

Have you FTP'd any of the current stuff?  

Have you looked at it?

What did you think?

Did you try to use it?

What happened?

What was the results (good/bad) of any testing you did?

Comments?


Please fill out the above in quadruplicate.  But all seriousness aside, I DO
need some feedback before I release a whole slew of new stuff.

Best,

RB

∂15-Jul-87  1051	berman@vaxa.isi.edu 	New Stuff
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 15 Jul 87  10:51:21 PDT
Posted-Date: Wed, 15 Jul 87 10:51:31 PDT
Message-Id: <8707151751.AA02798@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA02798; Wed, 15 Jul 87 10:51:33 PDT
To: CL-VALIDATION@sail.stanford.edu
Subject: New Stuff
Date: Wed, 15 Jul 87 10:51:31 PDT
From: Richard Berman <berman@vaxa.isi.edu>


There is now new versions of INTERN-DATA.LISP and TEST-MACRO.LISP in the
Valid-Code directory.  These function the same, but are more common-lisp
compatible than the earlier versions.

RB

∂17-Jul-87  1130	berman@vaxa.isi.edu 	Leaving...    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 17 Jul 87  11:30:41 PDT
Posted-Date: Fri, 17 Jul 87 11:30:50 PDT
Message-Id: <8707171830.AA04027@vaxa.isi.edu>
Received: from LOCALHOST by vaxa.isi.edu (5.54/5.51)
	id AA04027; Fri, 17 Jul 87 11:30:53 PDT
To: CL-Validation@sail.stanford.edu
Subject: Leaving...
Date: Fri, 17 Jul 87 11:30:50 PDT
From: Richard Berman <berman@vaxa.isi.edu>


Hi all.

I am very sorry to say that I am leaving ISI and will no longer be able to
continue work with the test suite.  It is not yet certain just who will be
continuing this valuable work, but I am sure that soon a message to this
effect will be posted.

Really vital (!) questions can be directed to Bob Balzer,  Balzer@vaxa.isi.edu

Best wishes, and I will try to stay in touch.

RB

∂25-Sep-87  1504	@BCO-MULTICS.ARPA:May.KBS@HIS-PHOENIX-MULTICS.ARPA 	Mailing List Status Check   
Received: from BCO-MULTICS.ARPA by SAIL.STANFORD.EDU with TCP; 25 Sep 87  15:04:08 PDT
Received: FROM HIS-PHOENIX-MULTICS.ARPA BY BCO-MULTICS.ARPA WITH dial; 25 SEP 1987 17:59:07 EDT
Posted-Date:  25 Sep 87 14:57 MST
Date:  Fri, 25 Sep 87 14:56 MST
From:  Bob May <May@HIS-PHOENIX-MULTICS.ARPA>
Subject:  Mailing List Status Check
Reply-To:  May%pco@BCO-MULTICS.ARPA
To:  <@BCO-MULTICS.ARPA:CL-VALIDATION@SAIL.STANFORD.EDU>
Message-ID:  <870925215656.078583@HIS-PHOENIX-MULTICS.ARPA>

Hello.  Would the chairperson please send me mail about joining the
mailing list and obtaining the archive files?  Thanks.  Sorry if this
isn't the right place for this request.  This is the only address I
could find.