perm filename STRGEN.PUB[DOC,LMM] blob sn#045526 filedate 1973-05-17 generic text, type C, neo UTF8

COMMENT ⊗ VALID 00028 PAGES C REC PAGE DESCRIPTION C00001 00001 VALID 00028 PAGES C00004 00002 .<< TITLE PAGE, INITIAL DECLARATIONS >> C00006 00003 .<< PUB MACROS >> C00008 00004 Problems of structural isomerism in chemistry have received much C00012 00005 Central to the successful solution of this problem is the generation C00016 00006 .TITL METHOD C00027 00007 .HD STRATEGY C00029 00008 .HD DESCRIPTION C00034 00009 .BEGIN TABLE I,Partitioning Steps Performed by the Cyclic Structure Generator C00036 00010 .HD PART A. Superatom Partitions. C00039 00011 .BEGIN TABLE II,Allowed Partitions of C↓6U↓3 Into Superatompots and Remaining Pot. C00041 00012 .HD PART B. Ring-superatom Construction. C00050 00013 With the reduced degree list of a superatompot, the program requests C00062 00014 .HD PART C. Acyclic Generator. C00063 00015 .TITL DISCUSSION C00072 00016 .BEGIN TABLE VI,The Number of Isomers for Several Empirical Formulae C00074 00017 .GRAF Constraints. C00078 00018 .HD CONCLUSIONS C00079 00019 .SPACE1 C00082 00020 .HD Appendix B. Isomerism and Symmetry. C00090 00021 .HD Appendix C C00106 00022 .HD Appendix D - Acyclic generator C00108 00023 .HD D1. Method for the case with even number of total atoms. C00110 00024 .HD |Category A. BOND CENTROID (see Fig. 3)| C00112 00025 .BEGIN TABLE VIII, Radicals Constructable from Given Parts C00114 00026 .HD |Category B. ATOM CENTROID (see Fig. 4).| C00119 00027 .HD |D2. Method for odd number of total atoms.| C00126 00028 .HD Summary C00127 ENDMK C⊗; .<< TITLE PAGE, INITIAL DECLARATIONS >> .DEVICE TTY .PAGE FRAME 53 HIGH 75 WIDE .AREA TEXT LINES 4 TO 51 CHARS 1 TO 69 .FOOTSEP←"------------------"; .BEGIN CENTER APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI Exhaustive Generation of Cyclic and Acyclic Isomers.(1,2) L. Masinter N.S. Sridharan J. Lederberg D.H. Smith Stanford University Stanford, California May, 1973 .END .GROUP SKIP 5 .BEGIN INDENT 6,6,6 ABSTRACT. A systematic method of identification of all possible graph isomers consistent with a given empirical formula is described. The method, embodied in a computer program, generates a complete list of isomers. Duplicate structures are avoided prospectively. .END .GROUP SKIP 10 .BEGIN VERBATIM ------------------ .END 1) For Part XI see R. Carhart and C. Djerassi, λJ. Chem. Soc., Perkin II,β submitted, 1973. 2) Financial support for this work was provided by the National Institutes of Health (RR 00612-03) and the Advanced Research Projects Agency (SD-183). .<< PUB MACROS >> .EVERY FOOTING(,{PAGE},) . .MACRO LMCOMMENT (REMARK) ⊂⊃ . .MACRO HD (HEADING) ⊂ BEGIN .IF LINES<10 THEN NEXT PAGE ELSE SKIP 2; .NOFILL;BREAK; αHEADINGβ .END .ONCE PREFACE 0;⊃ . .MACRO TITL (TITLE) ⊂ BEGIN .SKIP 2; NOFILL; BREAK;CENTER; αTITLEβ .END;⊃ . .MACRO GRAF (HDNG) ⊂ αHDNGβ .⊃ . .MACRO SPACE1 ⊂ SPREAD←1; PREFACE 2; BREAK; ⊃ . .MACRO SPACE2 ⊂ SPREAD←2; PREFACE 3; BREAK; ⊃ . .MACRO TABLE (N,TITLE) ⊂ NOFILL;GROUP;SINGLE SPACE; .BEGIN FILL;PREFACE 0; SKIP 1; -------------------------------------------------------------------- Table N. TITLE .END;⊃ . .TURNON "↑↓[]&→" .SKIP TO COLUMN 1 .SPACE2; Problems of structural isomerism in chemistry have received much attention. But only occasional inroads have been made toward a systematic solution of the underlying graph theoretical problems of structural isomerism. Recently the "boundaries, scope and limits" (3) .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 3) J. Lederberg, G.L. Sutherland, B.G. Buchanan, E.A. Feigenbaum, A.V. Robertson, A.M. Duffield, and C. Djerassi, λJ. Amer. Chem. Soc., α91β, 2973 (1969). .END; .⊃; of the subject of structural isomerism of acyclic molecules have been defined by the DENDRAL algorithm (3). This algorithm permits an enumeration and representation of all possible acyclic molecular structures with a given molecular formula. Acyclic molecules represent only a subset of molecular structures, however, and it may be argued that cyclic structures (including those posessing acyclic chains) are of more general interest and importance to modern chemistry. An approach to cyclic structure generation has appeared in a previous paper in this series (4). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 4) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and J. Lederberg, λOrg. Mass Spectrom.β, α4β, 493 (1970). .END; .⊃; That approach, which operates on a set of previously generated acyclic forms by labelling hydrogen atoms pairwise and connecting the atoms to which they are attached with a new bond, has one serious drawback. The approach cannot make efficient use of the symmetry properties of cyclic graphs; hence an inordinate amount of computer time must be spent in retrospective checking of each candidate structure with existing structures to remove duplicates. For this reason, an alternative approach to construction of cyclic molecules has been developed. This approach is designed to take advantage of the underlying graph theoretic considerations, primarily symmetry, to arrive at a method for more efficient construction of a complete and irredundant list of isomers for a given empirical formula. Central to the successful solution of this problem is the generation of all positional isomers obtained by substitutions on a given ring system. This topic has received attention for nearly 100 years, with limited success (5). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 5) See, for example, A.C. Lunn and J.K. Senior, λJ. Phys. Chem.β, α33β, 1027 (1929) and references cited therein. .END; .⊃; Its more general ramifications go far beyond organic chemistry. Graph theoreticians have considered various aspects of this topic, frequently, but not necessarily, in the context of organic molecules. Polya has presented a theorem (6) .SEND FOOT ⊂ .BEGIN SPACE1; NOFILL; 6) a) G. Polya, λCompt. rend.β, α201β, 1167 (1935); b) G. Polya, λHelv. Chim. Actaβ, α19β, 22 (1936); c) G. Polya, λZ. Kryst.β α92β, 415 (1936); d) G. Polya, λActa Math.β, α68β, 145 (1937). .END .⊃; which permits calculation of the number of structural isomers for a given ring system. Hill (7a,b) .SEND FOOT ⊂ .BEGIN SPACE1; NOFILL; 7) a) T.L. Hill, λJ. Phys. Chem.β, α47β, 253 (1943); b) T.L. Hill, λibid.β, p. 413. .END; .⊃; has applied this theorem to enumeration of isomers of simple ring compounds and Hill (7c) and Taylor (8) .SEND FOOT ⊂ .BEGIN SPACE1; NOFILL; c) T.L. Hill, λJ. Chem. Phys.β, α11β, 294 (1943). 8) W.J. Taylor λJ. Chem. Phys.β, α11β, 532 (1943). .END .⊃; have pointed out that Polya's theorem permits enumeration of geometrical and optical isomers in addition to structural isomers. More recently, formulae for enumeration of isomers of monocyclic aromatic compounds based on graph theory, permutation groups and Polya's theorem have been presented (9). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 9) A.T. Balaban and F. Harary, λRev. Rumaine de Chimieβ, α12β, 1511 (1967). .END .⊃; This history of interest and results provides only marginal benefit to the organic chemist. Although the number of isomers may be interesting, these methods (5-9) do not display the structure of each isomer. Also, these methods do not provide information on the more general case where the ring system is embedded in a more complex structure. Even for simple cases the task of specifying each structure by hand, without duplication, is an onerous one. .TITL METHOD; .HD OVERVIEW; .GRAF Framework. The framework for this method is that chemical structures consist of some combination of acyclic chains and rings or ring systems (10,11). .SEND FOOT ⊂ .BEGIN SPACE1; 10) J. Lederberg, DENDRAL-64, A system for Computer Construction, Enumeration, and Notation of Organic Molecules, Interim report to NASA, Parts I, II, and III, 1965. 11) It is assumed that structures are completely connected by chemical bonds; thus catenates and threaded structures are viewed as consisting of separate molecules. .END; .⊃; The problem of construction of acyclic isomers (and radicals) has been solved previously (3). If all possible ring systems can be constructed from all or part of the atoms in the empirical formula, and all possible acyclic parts are available from the acyclic generator, the combination of ring systems with acyclic parts in all unique ways would yield the complete list of isomers. The method for construction of ring systems is described below. This description employs some terms which require definition. The definitions also serve to illustrate the taxonomic principles which underlie the operation of the cyclic structure generator. The generator's view of molecular structure differs in some respects from the chemist's. A chemist, for example, may view structures possessing the same functional group or ring as related. The generator works at the more fundamental level of the vertex-graph (10), as described below. .GRAF Chemical Graph. A molecular structure may be viewed as a graph, termed the λchemical graphβ, or skeleton. A chemical graph consists of λnodesβ, with associated atom names, and λedgesβ, which correspond to chemical bonds. Consider as an example the substituted piperazine, αIβ, whose chemical graph is illustrated in Chart I as αII.β Note that hydrogen atoms are ignored by convention, while the symbol "U" is used to specify the unsaturation. The degree (primary, secondary, ...) of a node in the chemical graph has its usual meaning, i.e., the number of (non-hydrogen) edges connected to it. The valence of each atom determines its maximum degree in the graph. As usally displayed by chemists in planar representation, the chemical graph describes the connectivity rather than the geometric configuration of a molecular structure. .GRAF Superatom. In general, a chemical graph can be separated into cyclic and acyclic parts. Each cyclic structural sub-unit may be deemed a λsuperatomβ possessing any number of λfree valencesβ (12). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 12) A free valence is a bond with an unspecified terminus. Any substructure, cyclic or not, may be treated as a superatom; however, the term, in this paper, is generally restricted to cyclic (termed ring-) superatoms. .END; .⊃; The chemical graph α2β arises from a combination of two carbon atoms with ring-superatom α3.β Ring-superatom α3β possesses the indicated free valences to which the remaining hydrogen and two methyl radicals will be attached (Chart I). .PAGE←PAGE+1; .<< THIS SAYS RESERVE NEXT PAGE NUMBER (FOR CHART I) >> .GRAF Ciliated Skeletons. A λciliated skeletonβ is a skeleton with free valences. Ring-superatom α3β arises from the ciliated skeleton α4β by associating the atom names of eight carbon and two nitrogen atoms with the skeleton (Chart I). .GRAF Cyclic Skeleton. A chemical graph whose nodes are not associated with atom names and which contain no acyclic parts and no free valences is termed a λcyclic skeleton.β Ciliated skeleton α4β arises from one way of associating sixteen free valences with the nodes on the cyclic skeleton α4β (Chart I). .GRAF Vertex-Graph. Vertex-graphs (10) are cyclic skeletons from which nodes of degree less than three have been deleted. The vertex-graph of the cyclic skeleton α5β is the regular trivalent graph (11) of two nodes, α6.β Note that the remaining nodes of the cyclic skeleton α5β are of degree two. Removal of these secondary nodes from α5β while retaining the interconnections of the two tertiary nodes yields α6β (Chart I). As an illustration of the variety of structures which may be constructed from a given vertex-graph and empirical formula, for example, C↓1↓0H↓2↓0N↓2, consider that graph α6β is the vertex-graph for all bicyclic ring systems (excluding spiro forms). Cyclic skeletons α7β and α8β (Chart I), for example, may be constructed from eight secondary nodes and α6.β There are many ways of associating sixteen free valences with each cyclic skeleton, resulting in a larger number of ciliated skeletons. For example, α9β and α10β arise from different allocations of sixteen free valences to α5β (Chart I). There is only one way to associate eight carbon atoms and two nitrogen atoms with each ciliated skeleton to yield superatoms (e.g. α11β and α12β, Chart I). However, several structures are obtained by associating the remaining two carbon atoms (in this example) with each superatom, as an ethyl or two methyl groups. Chemical graphs α13β and α14β, for example, arise from two alternative ways of associating two methyl groups with superatom α3.β .GRAF Multiple Bonds. For the purposes of this program we adopt the formalism that all multiple bonds (double, triple, ...) are considered to be small rings by the program. Previous versions (acyclic generator) differ from this program in that double and triple bonds are regarded as specially labelled edges. .HD AIMS The cyclic structure generator must produce a complete list of structures without duplication. By duplicate structures we mean structures which are equivalent in some well-defined sense. The class of isomers generated by the program includes only connectivity isomers. Transformations allowed under connectivity symmetry preserve the valence and bond distribution of every atom. Connectivity symmetry does not consider bond lengths or bond angles. This choice of symmetry results in construction of a set of topologically unique isomers. A more detailed discussion of eqivalence is discussed in Appendix A and in the accompanying paper (13); .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 13) L. Masinter, N.S.Sridharan, Labelling Objects with Symmetry, to appear. .END; .⊃; a discussion of isomerism and symmetry is presented in Appendix B. .HD STRATEGY The strategy behind the cyclic structure generator is strongly tied to the framework described above. The strategy is summarized in greatly simplified form in Figure 1. The vertex- graphs from which structures are constructed can be specified for a given problem by a series of calculations. Thus Part A of the program (Figure 1) partitions the pot of atoms in all possible ways; each partition consists of those atoms assigned to one or more "superatom-pots" and a "remaining pot." Each superatompot is a collection of atoms from which all possible, unique ring-superatoms (12) can be constructed based on the appropriate vertex-graphs (Part B, Fig. 1). Each ring-superatom will be a ring system in completed structures. The atoms in the remaining pot will form acyclic parts of the final structures when combined in all possible, unique ways with the ring-superatoms from the corresponding initial partition (Part C, Fig. 1). .HD DESCRIPTION We are faced with the difficulty of describing a complex computer program in the traditional mode of presentation in a scientific journal. The narrative form is not the ideal medium for this description; simple examples do not always indicate all essential aspects of a program. A deeper understanding of a program could be engendered through the use of a large number of well chosen examples, but the length of such a presentation would be excessive and would tax the patience of even the most interested reader. We are thus aware of the insufficiency of considering only one example in the following written description. We have adopted the strategy of presenting essential aspects of the procedure for structure generation in the main body of the text. Details of the description which might obscure the principle concepts are placed in Appendices C and D. Mathematical details are available elsewhere (14,15). .SEND FOOT ⊂ .BEGIN SPACE1; NOFILL; 14a) H. Brown, L. Masinter, and L.Hjelmelend, Constructive Graph Labelling Using Double Cosets; λDiscrete Mathematicsβ, in press; b) Stanford Computer Science Memo STAN-CS-72-0318. 15) H. Brown and L. Masinter, An Algorithm for the Generation of Graphs of Organic Molecules; Discrete Mathematics, submitted. .END; .⊃; We hope this serves the purpose of providing the casual reader with a deeper understanding of the method without having to contend with details which, on the other hand, are important to others who wish to make use of our approach. The example chosen to illustrate each step of the method is C↓6H↓0⊂8~∃)Q%fAKq¬[aYJ↓I←Kf↓]←hA ←]iC%\AESYCYK]PA←dAQeSmC1K]hA¬i←[fQJ]N8X~∃←asOK\↓C]HA9Sie←≥K\XAIKgaK iSmK1rRA←HACi←5fA←L↓mCYK9GJAOIKCiKHAiQC8~∃M←UdXA]=dAC]dAk]SYCYK]PACi←5fA←i!KdAi!C\AQeIe←O∃\@QJ9N\XA QY←e%]JX~)MYk←IS]JR8~∀~∀9∂%β↓!Cei%iS←]%]NAC9HA→C KYYS9N\~∃QQJA[∃GQC]%gZAM=dAgiIkGikIJAOK9KeCi%←\~∃%]m←YYKfAB↓gKeS∃fA←LEaCeQSiS←9S]ND↓giKaLAM←Y1←oKH↓ErAB↓gKeS∃fA←L4∀EYC KYYS9NDAgQKaf\A!CeQSiS←9fACe∀A[CI∀A←LA%iK[f↓oQSG A[kgPAEJ~)CggS≥]KHAQ↑A←E)KGifQkgk¬YYrA≥eCaP↓giek ikeKLA←dAACeif↓iQKe∃←LR~)CfAi!JA[←1KGkY¬dAgiIkGikIKfACIJAEk%YhAk@AMe←4AiQJ↓mKei∃p[Oe¬aQf\4∃)QJ↓ae←G∃gfAEdAoQS PASi∃[fACIJACgMSO]K⊂Ai↑AQQJAOICaQf↓SfAi∃e[KH4∃YCE∃YYS]≤@Pbf0bhR\AqC5S]Ci%←\A←_AπQCIhA∩AIKmKC1fAiQ∀AISM→KeK]P~∃isAKfA←_ASiK5fAS]Y←YmK⊂\@A =dAKq¬[aYJ0A]←I∃fACe∀AaCeQSiS←9KHAC5←]NA¬]H~∃1CEKY1KHAkA←\Ai!JAKI≥KfA←_AiQJ↓mKei∃p[Oe¬aQfAQ↑AsS∃YHAi!J~∃GeGYSF↓gWKY∃i←]f8@A e∃JAmC1K]GKLACeJ↓aCei%iS←]∃HAC[=]NAC9H~∃Y¬EKYY∃HAka=\AiQ∀A]←I∃fA←L↓GsGY%FAgW∃YKi←9fAi↑↓sSKY⊂AGSY%CiKH4∃gWK1Ki←]LXAC]⊂Ag↑A→←eiP8~∀~∃ACeiSQS←]S9NAgi∃afAS8AiQJ↓gkEg∃ckK]PAISg kggS=\ACe∀AGCeISKHA=kh~∃¬ggk[%]NAi!ChA← UKGiLAC[←9NAoQ%GPASQK[fA¬eJAa¬eiSi%←]KH↓CeJA%]ISgPZ~∃S9OkSg!CEYJ8@A SMiS]OUSgQC SYSidA←LA=EUKGQf@QK⊃OKfX↓]←IKLX@\\8RASf4∃gaK SMSK⊂AIke%]NAY¬EKYY%]NAC9HAoS1XAEJ↓ISgGUggKH↓S\AB↓gkEg∃ckK]P~∃gK iS←\8@A)Q∀AaCeQSiS←9S]NAMiKaf↓aKeM=e[KH↓ErAi!JAae=OeCZ↓CeJ~)←kiY%]KHA%\A)C YJA∩8ACG AgiK@ASfA⊃KgGe%EKHA%\A[←IJAIKQCSXA KY←n8~∀_]¬≥∪≤A)¬¬→ A$Y!CeQSiS←9S]NAMiKaf↓!KeM=e[KH↓ErAi!JAπs YSFAMiekGQkeJA≥K]Ke¬i←d~(~∃'i∃`@F@@@@@@@@@↓!Cei%iS←\@@@@@@@@@@@@Aβ[←9N~∀~(@@@b@@@@@@@@↓βi←[LAC]H↓+]gCQkeCi%←]f@↓'kaKICi←[A←ifA¬]H~∀@@@@@@@@@@@A%\A[ASeSG¬XA ←I[kYB@@@AIK[CS9S]NAA←h\~(~∀@@d@@@@@@@@A e∃JA-C1K]GJ@@@@@@@@@Aβi=[fAS8A'ka∃eCi←5a←h~(~∀@@f@@@@@@@@A'K ←]ICIrA≥←⊃Kf@@@@@@@A→←=af@↑↓≥←\[1←←af4∀~∀@@h@@@@@@@@A≥=\[Y←=`A'K ←]ICIr@@@@@A⊃OKfA=LA∂e¬aP~∀@@@@@@@@@@@A9←IKf4∀~∀@@l@@@@@@@@A%%]N[gUaKeCQ←[fA¬]H@@@@A→MKeK9hA→S9Wf~∀@@@@@@@@@@@AIK[CS9S]NAA←h@@@@@@@@@@!gKJA¬aaK]⊃SpAλ$~∀~∀4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZ~∀9≥λ~(_]⊃λ↓!β%(↓α\@AMkaKe¬i←ZAACeiSQS←]f8~∃%S9N[gkAKeCi=[fACIJ@Ei]↑[G←9]KGi∃HDAgQekGiUeKfX↓R]J\0AiQJ4∃eS]≤[gka∃eCi←4AGC]9←hAE∀AgaY%hAS]Q↑Aio<AaCeQfAEr↓gGSgMS←\A=LAB~)gS]O1JAE←9H\@AQQJACQ←[fA%\AC\↓K[aSISGCX↓M←e[UYBA[¬rAEJ↓ISgiISEki∃H~∃C5←]NA→e←ZA=]JAi<AgKm∃eCXAMkGPAQo↑[G=]]KGQKHAe%]N[gUaKeCQ←[f\Aα~∃⊃Sgie%EkiS=\AoQ%GPAC1Y←if↓Ci←[LAi↑AQo↑A←HA[←e∀Agka∃eCi←5a←if↓oSYX4∃sSK1H@Qe∃gaKGQSmKYdRAgiIkGikIKfAG=]iCS9S]NAQo↑A←HA[←e∀~∃eS9N[gkAKeCi=[fAY%]WKH↓i←OKQQKdA rAgS9OYJA ←]IfQ←dA¬GsGY%FAGQ¬S]fR4∀Pbl$\~∀]M≥λA→∨∨(@@@~∀] ∂∪≤↓'!βπ∀bvA∪9 ≥(`X`v4∀blRAπQK5Sgif↓CeJA5←eJA→C[SY%CdAo%iPAi∃e[fAMkGPA¬fAeS9OfA←HAeS]≤~∃gsMiK[f8@A)Q∀AiKe4Aio↑5G←]]∃GiKH↓SfAkMKHAQ∃eJAS8AG←])k]Gi%←\~∃]SiPAIS]N[MkaKe¬i←[f↓M←dA∧A[←e∀AaeK SgJA⊃KgGe%aiS←8\@A =dAKq¬[aYJ0~∃ESAQK]s0A[Cr↓EJAm%KoKH↓CfAB↓gS]O1JAeS9NAgsMiKZA=dAio<AeS]≥fAIKAK]IS9N~∃←8AiQJ↓GQK[%GCXA ←]iKah\@A%\AiQ%fAo←IVXAQ=oKmKHXAESAQK]s0AG←]MSgif↓←L~∃Qo↑Ae%]N[gUaKeCQ←[f@!io↑AAQK]s0AeS]≥fRAY%]WKH↓ErAB↓gS]O1JAE←9H\~∀9≥λv4∀\"v4∀~∃∪8AiQJ↓OK]KICiS←8Aae← KgfX↓←]JA5kghA→S]HA¬YXAa=ggSE1JAoCefA←L4∃aCeQSiS←9S]NAQQJAO%mK\A→←e[k1BAS]Q↑AgkAKeCi=ZAa←QfAC]⊂ABAe∃[CS]%]N~∃A←hXAMkGPAQQChA5←YKGUYKfA C\AE∀AG←]MiekGQKH\AQQJAG=]gSI∃eCiS=]fAS8~∃M←I[S]N↓gkaKICi←Z↓aCei%iS←]LAIKC0AaeS5CeSYdAoSi AmCY∃]GJA¬]H~∃U]gCiUeCiS=\\A)!SfAaI←GKIUeJASLAgk[5CeSu∃HAS\↓βaaK9ISpAXA'kAKeCi=Z~∃!¬eiSi%←]f\↓)QJAACeiSQS←]f↓oQSG AeKgUYhACIJAgk5[CeSiKHAS8A)CE1JA∪∩8~∀~∀_]¬≥∪≤A)¬¬→ A%∩YβY1←oKH↓!Cei%iS←]LA←LAαm*αLA∪]i<A'ka∃eCi←5a←if↓C]HAIK[CS9S]NAA←h\~(~∃!CIiSiS=\@@@@@A≥U[EKd↓←L@AMkaKe¬i←[a=h@A≥U[EKd@@@AIK[CS9S]N~)≥k[E∃d@@@@@A'UaKeCQ←[a←Qf@@@b@@@@@d@@@@@f@@@@@@AA←h~∀4∀@@@@b@@@@@@@@@@@b@@@@Aε∧m*αf@@@Z@@@@@Z@@@@@@@Z~∀@@@@H@@@@@@@@@@@@D@@@@AεαkTαf@@@Z@@@@@@4@@@@@@Aε∧b~∀@@@@f@@@@@@@@@@@b@@@@↓εαi*∧f@@@Z@@@@@@Z@@@@@AεαH~∀@@@@h@@@@@@@@@@@b@@@@Aαg*αL@@@@4@@@@@@Z@@@@@Aεαf4∀~∀@@@@j@@@@@@@@@@@d@@@@↓εαi*∧d@@Aαe*αD@@@@Z@@@@@@@Z~∀@@@@l@@@@@@@@@@@d@@@@↓εαg*∧d@@Aαe*αD@@@@Z@@@@@@Aαb~∀@@@@\@@@@@@@@@@@@H@@@@AεαeTαd@@↓εαe*∧b@@@@Z@@@@@@↓εαd~(~∀@@@@p@@@@@@@@@@@d@@@@Aαi*αD@@Aε∧e*αd@@@@4@@@@@@@@4~∀@@@@r@@@@@@@@@@@d@@@@Aαg*αD@@Aε∧e*αd@@@@4@@@@@@Aε∧b~∀~(@@@@D`@@@@@@@@@@@d@@@@AεαM*αd@AεαgTαb@@@@Z@@@@@@@Z~(~∀@@@bb@@@@@@@@@@@f@@@@Aαe*αD@@Aε∧e*αb@@Aε∧e*αb@@@@@Z~∀4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZZZ4ZZ~∀9≥λ~(.HD PART B. Ring-superatom Construction. Each partition (Table II) must now be treated in turn. The complete set of ring-superatoms for each superatompot in a given partition must be constructed. The major steps in the procedure are outlined in Figure 2. .GRAF Valence List. The first step in part B is to strip the superatompot of atom names, while retaining the valence of each atom. The numbers of each type of atom are saved for later labelling of the ciliated skeleton (Chart I). A valence list may then be specified, giving in order the number of bi-, tri- , tetra- and n-valent nodes which will be incorporated in the superatom. Thus the superatompot C↓6U↓3 is transformed into the valence list 0 bivalents, 0 trivalents, 6 tetravalents (0, 0, 6), and C↓4U↓2 becomes (0, 0, 4) (Figure 2). .GRAF Calculation of Free Valence. From the valence list and the associated unsaturation count the number of free valences of each superatompot is determined uniquely. (see Calculation of Free Valence, Appendix C). For C↓6U↓3 the free valence is eight (Fig. 2). .GRAF Partitioning of Free Valence. The free valences are then partitioned among the nodes in the valence list in all possible, unique ways. .GRAF Degree List. Each partition of free valences alters the effective valence of the nodes in the original valence list with respect to the ring-superatom. In the example, assignment of one or two free valences to a tetravalent node transforms this node into a tri- or bivalent node respectively. As the ring-superatom is constructed, those tetravalent nodes which have been assigned, say, two free valences, have then only two valences remaining for attachment to the ring-superatom. These nodes are then of degree (17) .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 17) Use of the term degree in this report refers to the number of bonds other than free valences, with double bonds being counted twice. A free valence may or may not eventually be attached to a hydrogen atom in the final structure. .END; .⊃; two and may be termed secondary nodes. Thus the partition of free valences 2,2,2,2,0,0 on six tetravalent nodes yields the degree list (4,0,2) (Fig. 2) as four of the tetravalent nodes receive two free valences each yielding four nodes of degree two (secondary) and leaving two nodes of degree four (quaternary). The program keeps track of the number of free valences assigned to all nodes for use in a subsequent step. .GRAF Loops. As will be clarified in the subsequent discussion, there are several general types of ring-superatoms which cannot be constructed from the vertex-graphs available in the CATALOG (described below). These are all cases of multiple extended unsaturations either in the form of double bonds or rings. Examples are the following: .BEGIN NOFILL 1) bi-, tri-, ... n-cyclics with exocyclic double bonds; 2) some types of SPIRO ring systems; 3) allenes extended by additional double bonds, e.g., C=C=C=C .END The concept of a loop, consisting of a single unsaturation and at least one bivalent node must be specified for these cases. Examples of loops containing one, two and three bivalent nodes are shown in Chart II. Note that the two remaining "ends" of thhe unsaturation will yield a "looped structure" when attached to a single node in a graph (shown as X, Chart II). .BEGIN NOFILL GROUP --------------------- Chart II bivalents = 1 2 3 .SKIP 10 --------------------- .END The method for specification of loops is discussed in Calculation of Loops, Appendix C. .GRAF Partitioning of Secondary Nodes among Loops and Nonloops. The secondary nodes in the degree list are partitioned among the loops (if any) calculated in the previous step and non-loop secondary nodes of the graph Aspects of this partitioning step are presented in Partitioning of Secondary Nodes, Appendix C. Results for the example are indicated in Figure 2. .GRAF Reduced Degree List. This procedure yields the reduced degree list which contains none of the secondary nodes originally present in the degree list. Any secondary nodes appearing in the reduced degree list are termed "special" secondary nodes as these nodes will have loops attached in subsequent steps. .GRAF Vertex-Graphs. The reduced degree lists are used to specify a set of vertex-graphs for the eventual ring-superatoms. All two- connected structures can be described by their vertex-graphs, which are, for most structures, regular trivalent graphs. This concept has been described in detail by Lederberg (10), who has also presented a generation and classification scheme for such graphs. Given a set of all vertex-graphs, the set of all ring-superatoms may be specified. The vertex-graphs are maintained by the program in the CATALOG. Catalog entries for regular trivalent graphs possessing two and four nodes are presented in Table III. This list must be supplemented by additional vertex-graphs to cover several special cases required for generation of all structures for the example. These are also presented in Table III. .PAGE←PAGE+1 .<< make room for Table III >> With the reduced degree list of a superatompot, the program requests the appropriate CATALOG entries. In the example (Fig. 2), the reduced degree list (4,0,2) specifies vertex-graphs containing two quaternary nodes (tetravalent dihedron). The reduced degree list (2,4,0) specifies regular trivalent graphs of four nodes, of which there are two: 4AA and 4BB (Table III). When λonlyβ secondary nodes are present in the reduced degree list, the graph "Singlering" (Table III) is utilized. .GRAF Interlude. Up to this point the program has effectively decomposed the problem into a series of subproblems, working down from the total pot of atoms through a series of partitions and subpartitions to the set of possible vertex-graphs. In subsequent steps the vertex-graphs are expanded to the final structures by a series of constructive graph labellings (Table IV). .BEGIN TABLE IV,The Six Graph Labelling Steps Performed by the Labelling Algorithm Labelling Step Function 1 Label Edges of Vertex-Graphs with Special Secondary Nodes. 2 Label Edges of Resulting Graphs with Non-Looped Secondary Nodes. 3 Label Loops of Resulting Graphs with Looped Secondary Nodes. 4 Label Nodes of Skeletons with Free Valences. 5 Label Nodes of Skeletons with Atom Names. 6 Label Free Valences of Superatoms with Radicals (see Appendix D). -------------------------------------------------------------------- .END .GRAF Labelling Edges of Vertex-Graphs with Special Secondary Nodes. Special secondary nodes are those that will have loops attached. The specification of the possible attachments of the nodes to the graph ia a "labelling" procedure. This is the first of six such graph labelling steps performed by the program. (Table IV). All of these labelling steps involve the same combinatorial problem, that of associating a set of λnβ labels, not necessarily distinct, with a set of objects with arbitrary symmetry (13). The same labelling algorithm is utilized for each of the six labelling steps. A description of the underlying mathematics and proof of completeness and irredundancy appears separately (14). Some aspects of the first labelling step indicate how equivalent labellings (which would eventually yield duplicate structures) may be avoided prospectively, by recognition of the symmetry properties of the graph; in the first labelling, the vertex-graph. These symmetry properties are expressed in terms of the permutation group (see Appendix A and refs. 13 and 14) on the edges of the vertex-graph. This permutation group, which defines the equivalence of the edges, may be specified in the CATALOG or, alternatively, calculated as needed by a separate part of the cyclic structure generator. As subsequent steps are executed, a new permutation group (e.g., on the nodes for labelling step four, Table IV) is derived as necessary (13). Thus, only labellings which result in unique expansions of the structure are permitted. The reader examining Fig 2 may note that for this simple example the symmetries of the vertex-graphs and subsequent skeletons can be discerned easily by eye. For example, all edges of the tetravalent dihedron are equivalent, as are all the edges of the regular trivalent graphs 2A and also 4BB. The $3BCB graph (Table II, Fig. 2) has four equivalent edges and one other edge, and so forth. In the general case, however, the symmetries of the vertex-graphs and subsequent expansions thereof are not always obvious. With the group on the edges specified, the labelling of the vertex-graphs with special secondary nodes is carried out. The results of this procedure for partitions containing loops are indicated in Figure 2. .GRAF Labelling with Non-Loop Secondary Nodes. The graphs which resulted from the previous labelling are now labelled with the partitions of non-loop secondary nodes (Appendix C). Each of the five partitions for the left-most vertex-graph in Fig. 2 immediately above results in a single labelling, as all four edges of the graph are equivalent. When edges are distinguishable there may be several ways to label a graph with a single partition. There are, for example, for the $3BCB graph, two ways to label with the partition 3,0,0,0,0, four ways with the partition 2,1,0,0,0 and three ways with the partition 1,1,1,0,0 (Fig. 2). .GRAF Labelling with Loop Secondary Nodes. There remain unassigned to the graphs at this point only secondary nodes which were assigned to loops. These nodes are first partitioned among the loops. (see Partitioning of Loop Secondary Nodes, Appendix C). For example, following the path from the degree list (4,0,2) through labelling with non-loop secondary nodes (Fig. 2), there are two ways of labelling the two equivalent loops with four secondary nodes. There is one way to label the two loops of the adjacent graph with three secondary nodes and one way of labelling the two loops of each of the two remaining graphs in this section of Figure 2 with two secondary nodes. In this example (C↓6U↓3) the loops in every case are equivalent or there is only one loop to be labelled. In the general case loops may not be equivalent, resulting in a greater number of ways to label loops with a given partition of secondary nodes. .GRAF Cyclic Skeletons. The previous labelling steps specified the number of secondary nodes on each edge of and loop attached to the vertex-graphs. All atoms in the original superatompot are thus accounted for. A representation of the result is the cyclic skeleton, where nodes and their connections to one another are specified. (These skeletons begin to resemble conventional chemical structures.) .GRAF Labelling with Free Valences. The nodes in a cyclic skeleton are then labelled with free valences, yielding ciliated skeletons. This labelling is trivial in the example, as all atoms are of the same valence (four) (Figure 2). Free valence labelling is performed with knowledge of how many atoms of each valence were present in the original superatompot, but independent of the λidentitiesβ of the atoms. The combinatorial complexity of this labelling problem follows from the possible occurance of atoms with differing valencies. In the general case there may be several ways to perform this labelling on a single cyclic skeleton, whereas in the C↓6U↓3 example there is only one way. .GRAF Labelling with Atom Names. The nodes of a ciliated skeleton are then labelled with atom names to yield the ring-superatom(s). Again this labelling is trivial in the example, as only one type of atom is present (carbon), yielding in each case only a single superatom (Fig. 2). If there is more than one type of atom with the same valence (e.g., silicon and carbon), the labelling problem is more complex. Each node of appropriate valence may be labelled with either type of atom. Duplicate structures are avoided by calculations involving the group pertaining to the set of nodes of equal valence. .HD PART C. Acyclic Generator. The superatom partition expanded in the example had no atoms assigned to acyclic chains (remaining pot). The set of superatoms on completion of Part B, above, thus yields the set of 36 structures on placement of a hydrogen atom on each free valence (Fig. 2). If the superatom partition (partitions 2-11, Table II) contained more than one superatompot or any atoms in the remaining pot, the acyclic generator must be used to connect the segments of the structure in all ways. This procedure is described in detail in Appendix D. .TITL DISCUSSION αCompletion ofβ C↓6H↓8. The example (Fig. 2) has considered only expansion of a single superatom partition. It might be instructive for the reader to attempt to generate all, or at least the remaining, structures for C↓6H↓8. The number of solutions is presented in a subsequent section. If the algorithm as outlined in Figure 2 is followed, it is suggested that the initial superatom partitions in Table II be examined carefully. These partitions yield some indication of the types of structures which will result from each partition. For example, partition 4, C↓3U↓3 in a single superatompot, plus three carbons in the remaining pot, should yield all structures containing a three-membered ring possessing two double bonds or a triple bond. As there are only two free valences, the remaining atoms can be in a single chain (as a propyl or iso-propyl radical) or as a methyl and an ethyl group, but not as three methyl groups. .GRAF Completeness and Irredundancy. Although a mathematical proof of the completeness and irredundancy of the method exists (15), there is no guarantee that the implementation of the algorithm in a computer program maintains these desired characteristics. Confidence in the completeness and irredundancy of a program of this complexity can be engendered in the following ways: .BEGIN INDENT 6 1) Verification of the program's performance by another, completely independent approach. An independent method has been developed which enumerates, but does not construct all isomers of compositions containing C,H,N, and O (18). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 18) L. Masinter, Enumeration of Graphs of Molecules; in preparation. .END; .⊃; It is interesting that the program for simple counting of the solutions is significantly slower than construction of all of the solutions, despite some effort to improve the efficiency of the former program. Thus, due to limitations of computer time, we have been limited to compositions containing only 5 or fewer non-hydrogen atoms. For these cases, however, the numbers of isomers obtained by both programs agree. 2) Testing by manual generation of structures. Several chemists, all without knowledge of the algorithm described above, have been given several test cases, including C↓6U↓3, from which structures were generated by hand. Familiarity with chemistry is no guarantee of success, as evidenced by the performance of three chemists for the superficially simple case of C↓6U↓3 (C↓6H↓8, Table V). .BEGIN TABLE V,|Performance of Three↑* Chemists in Manual Generation of Isomers of C↓6H↓8 (C↓6U↓3). There are 159 Isomers.|; Number Generated Type of Error Chemist 1 161 4 duplicates; 4 omissions; 2 with 7 carbon atoms. Chemist 2 168 16 duplicates; 7 omissions Chemist 3 160 2 duplicates; 1 omission * One PhD and two graduate students. -------------------------------------------------------------------- .END This example indicates that for more than very trivial cases, it is extremely difficult to avoid duplicates (tricyclics, for example, are difficult to visualize when testing for duplicates) and omissions. Omissions appear to result from both carelessness and forgetting ring systems that are implausible or unfamiliar. The program seems better at testing the chemist than vice versa. In every instance of manual structure generation, no one has been able to construct a legal structure that the program failed to construct. No one has been able to detect an instance of duplication by the program. This performance builds some confidence, but manual verification of more complicated cases is extremely tedious and difficult. Isomers for many empirical formulae have been generated, and some results are tabulated in Table VI. The choice of examples has been motivated by a desire to test all parts of the program where errors may exist while keeping the number of isomers small enough to allow verification. In this manner all obvious sources of error have been checked, for example, construction of loops on loops, multiple types of atoms of the same valence (e.g. Cl, Br, I) and partitions containing atoms of several different valences including penta- and hexavalent atoms. 3) Varying the order of generation. The structure of the program permits additional tests by doing some operations in a different order. For example, one variation allowed is to leave hydrogens associated with the atoms in each partition rather than to strip them away initially and place them on the remaining free valences in the last step. Each such test has resulted in the same set of isomers. 4) Using Polya enumeration (6) at the various labelling steps of the procedure to verify the correctness of sub-parts of the program. Using various combinatorial formulae, one can insure that the results of at least parts of the program are consistant with independant calculations. This approach was used extensively in the development of the labelling algorithm. .END In summary, the verification procedures utilized have all indicated absence of errors in the computer implementation of the algorithm. Also, there is no clear reason why generation of larger sets of isomers should not also proceed correctly. The final verdict however, must await development of new mathematical tools for verification by enumeration (see above) or an alternative algorithm. .BEGIN TABLE VI,The Number of Isomers for Several Empirical Formulae Empirical Example Number of Isomers Manually Verified? Formula Compound C↓6H↓6 benzene 217 yes C↓6H↓8 1,3-cyclohexadiene 159 yes C↓6H↓1↓0 cyclohexene 77 yes C↓6H↓1↓2 cyclohexane 25 yes C↓6H↓1↓4 hexane 5 yes C↓6H↓6O phenol 2237 C↓6H↓1↓0O cyclohexanone 747 no C↓6H↓1↓2O 2-hexanone 211 yes C↓3H↓4N↓2 pyrazole 155 no C↓3H↓6N↓2 2-pyrazoline 136 yes C↓3H↓8N↓2 tetrahydropyrazole 62 no C↓3H↓1↓0N↓2 propylenediamine 14 yes C↓4H↓9P↓1 (pentavalent P) 110 no -------------------------------------------------------------------- .END .GRAF Constraints. The structure generator is designed to produce a list of all possible graph connectivity isomers (Appendix B). This list contains many structures whose existence seems unlikely based on present chemical knowledge. In addition, the program may be called on to generate possible structures for an unknown in the presence of a body of data on the unknown which specify various features, (e.g., functional groups) of the molecule. In such instances mechanisms are required for constraining the generator to produce only structures conforming to specified rules. The implementation of the acyclic generator possessed such a mechanism in the form of GOODLIST (desired features) and BADLIST (unwanted features) (3) which could be utilized during the course of structure generation. The cyclic generator is less tractable. As in prospective avoidance of duplicate structures, it is important that unwanted structures, or portions thereof, be filtered out as early in the generation process as possible. It is relatively easy to specify certain general types of constraints in chemical terms, for example, the number of each of various types of rings or ring systems in the final structure, ring fusions, functional groups, sub-structures and so forth. It is not always so easy to devise an efficient scheme for utilizing a constraint in the algorithm, however. As seen in the above example (Fig. 2) the expanded superatom partition results in what would be viewed by the chemist as several very different ring systems. The design of the program facilitates some types of constraints. For example, the program may be entered at the level of combining superatoms to generate structures from a set of known sub-structures. If additional atoms are present in an unknown configuration, they can be treated as a separate generation problem, the results of which are finally combined in all ways with the known superatoms. This approach will not form additional two-connected structures, however. Constraints which disallow an entire partition may be easily included. For example, it is possible to generate only open chain isomers by "turning off" the appropriate initial superatom partition. Much additional work remains, however, before a reasonably complete set of constraints can be included. The implementation of each type of constraint must be examined and tested in detail to ensure that the generator remains thorough and irredundant. .HD CONCLUSIONS The boundaries, scope and limitations of chemical structure can now be specified. .HD ACKNOWLEDGEMENTS We wish to express our thanks to Professor Harold Brown, (currently at Ohio State University) who provided valuable assistance in the formulation of the mathematical principles underlying the structure generator. .NEXT PAGE .SPACE1 .HD Appendix A. Equivalence Classes and Finite Permutation Groups. Two members of a set of possible isomers may be defined to be equivalent if a specified transformation of one member causes it to be superimposable upon another member of the set. For example, there are fifteen possible ways of attaching two chlorine and four hydrogen atoms to a benzene ring (Chart III). .BEGIN NOFILL ----------------------------------------------------------------- Chart III .group skip 30 ----------------------------------------------------------------- .END If rotations by multiples of 60 degrees are specified as allowed transformations, the fifteen structures fall logically into three classes, termed "equivalence classes" (Scheme 4). Within each equivalence class structures may be made superimposable by the rotational transformation. If one element (in this case a molecular structure) is chosen from each equivalence class, the complete set of possible structures is determined, without duplication. It is the task of the labelling algorithm to produce one and only one graph labelling corresponding to one member of each equivalence class. The set of transformations which define an equivalence class is termed a "finite permutation group." This permutation group may be calculated based on the symmetry properties of a graph (or chemical structure in the example of Scheme 4). This calculation provides the mechanism for prospective avoidance of duplication. These procedures are described more fully in the accompanying paper (13). .NEXT PAGE .HD Appendix B. Isomerism and Symmetry. Appendix A introduced the concept of equivalence classes and finite permutation groups. The selection of transformation (Appendix A) directs the calculation of the permutation group and thus defines the equivalence classes. Different types of transformation may be allowed depending on the symmetry properties of the class of isomers considered. This Appendix discusses several of the possible types of isomerism, most of which are familiar to chemists. The reader seeking a more thorough discussion of some types of isomerism discussed below is referred to an exposition of molecular symmetry in the context of chemistry and mathematics (19). .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 19) I. Ugi, D. Marquarding, H. Klusacek, G. Gokel, and P. Gillespie, λAngew. Chem. internat. Edit.β, α9β, 703 (1970). .END; .⊃; Isomers are most often defined as chemical structures possessing the same empirical formula. Different concepts of symmetry give rise to different classes of isomers, some of which are described below. .GRAF Permutational Isomers. Permutational isomers are isomers which have in common the same skeleton and set of ligands. They differ in the distribution of ligands about the skeleton. Gillespie et al. (20) .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 20) P. Gillespie, P. Hoffman, H. Klusacek, D. Marquarding, S. Pfohl, F. Ramirez, E. A. Tsolis, and I. Ugi, λAngew. Chem.β λinternat. Edit.β, α10β, 687 (1971). .END; .⊃; and Klemperer (21) .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; .NOFILL 21) a) W. G. Klemperer, λJ. Amer. Chem. Soc.β, α94β, 6940 (1972); b) W. G. Klemperer, λibidβ, p. 8360. .END; .⊃; have used the concept of permutational isomers to probe into unimolecular rearrangement or isomerization reactions. .GRAF Stereoisomers. Ugi et al. (19) have defined the "chemical constitution" of an atom to be its bonds and bonded neighbors. Those permutational isomers which differ only by permutations of ligands at constitutionally equivalent positions form the class of stereoisomers. .GRAF Isomers Under Rigid Molecular Symmetry. If one conceives of molecular structures as having rigid skeletons, the physical rotational (three dimensional) symmetries and transformations may be readily defined. Each transformation causes each atom (and bond) to occupy the position of another or same atom (and bond) so that the rotated structure can physically occupy its former position and at the same time be indistinguishable from it in any way. This is the most familiar form of symmetry. Under this type of symmetry conformers are distinguishable and belong in distinct equivalence classes. Every transformation is orthogonal and preserves bond angles and bond lengths as well as maintaining true chirality. If one allows other orthogonal transformations that alter chiral properties of structures, equivalence classes result that treat both the left-handed and right-handed forms of chiral molecules to be the "same". Thus a "mirror image" transformation when suitably defined permits the left-handed form to exactly superimpose the right-handed form and vice-versa. .GRAF Isomers Under Total Molecular Symmetry. If in addition to the above mentioned rigid molecular transformations one recognizes the flexional movements of a nonrigid skeleton, a dynamic symmetry group may be defined. Under this definition, different conformers now are grouped together. Thus the "chair" and "boat" conformations of cyclohexane belong to the same equivalence class under dynamic symmetry. The permutation group of skeletal flexibility is computable separately and independently of rigid molecular symmetry. One can then view total molecular symmetry as the product of the two finite permutation groups. .GRAF Isomers Under Connectivity Symmetry. The concept of connectivity symmetry was introduced previously (METHOD section). Every permutation of atoms and bonds onto themselves is a symmetry transformation for connectivity symmetry if, .BEGIN INDENT 6,6 a) each atom is mapped into another of like species, e.g., N to N, C to C, O to O, and b) for every pair of atoms, the connectivity (none, single, double , triple, ...) is preserved in the mapping, i.e. the the connectivity of the two atoms is identical to the connectivity of the atoms they are mapped into. .END One can readily recognize that transformations as defined automatically preserve the valence and bond distribution of every atom. It is very probable that readers accustomed to three dimensional rotational and reflectional symmetries will tend to equate them with the symmetries of connectivity. It is emphasized again that connectivity symmetry does not consider bond lengths or bond angles, and it includes certain transformations that are conceivable but have no physical interpretation save that of permuting the atoms and bonds. .NEXT PAGE .HD Appendix C .GRAF Superatom ParAtitions. The first step is to replace the hydrogen count with the degree of unsaturation. The number of unsaturations (rings plus double bonds) is determined from the empirical formula in the normal way, as given in equation 1. .BEGIN NOFILL U = 1/2 (2+↑n&↓[i=1]&∃(i-2)a↓i)→(1) U = unsaturation i = valence n = maximum valence in composition a↓i= number of atoms with valence i .END If the unsaturation count is zero, the formula is passed immediately to the acyclic generator. Specifying the unsaturations as U's, the example C↓6H↓8 becomes C↓6U↓3 (hydrogen atoms are omitted by convention). There are several rules which are used during the partitioning scheme, as follows: .BEGIN INDENT 0,5 I. The resulting formula is stripped of other univalent atoms (e.g., chlorine) as such atoms cannot be part of two-connected ring-superatoms. These univalent atoms are relegated to the pot of remaining atoms. II. The remaining pot in a given partition (those atoms not allocated to superatompots) can contain NO unsaturations. Thus ALL rings and/or double bonds will be generated from the superatompots. III. It follows that every superatompot in the partition must contain at least two atoms of valence two or higher plus at least one unsaturation. If there are no unsaturations then no rings could be built. In addition, an unsaturation cannot be placed on a single atom. This rule defines the minimum number of atoms and unsaturations in a superatompot. IV. The maximum number of unsaturations in a superatompot is given by Equation 2. Superatoms must possess at least one free valence (12), so that superatompots with no free valences, e.g., O↓2U↓1 or C↓2U↓3, are not allowed, unless the superatompot contains all atoms in the emperical formula (since no univalents, and thus no hydrogens, are allowed in a superatompot, this is indeed a rare occurance.) .BEGIN NOFILL U↓[max]= 1/2 (↑n&↓[i=3]&∃(i-2)a↓i)→(2) U↓[max]= maximum unsaturation of a superatompot i = valence a↓i = number of atoms with valence i .END V. The maximum number of superatompots for a given formula is defined by equation 3. .BEGIN NOFILL S↓[max]= 1/2↑n&↓[i=2]&∃a↓i→(3) n = maximum valence in composition S↓[max]= maximum number of superatompots in a superatom partition note: the summation is over all atoms of valence >2; univalents are not considered. .END .END Rules I-V define the allowed partitions of a group of atoms into superatompots. These rules do not, however, prevent generation of equivalent partitions, which would eventually result in duplicate structures. Defining a canonical ordering scheme to govern partitioning prevents equivalent partitions. One such canonical ordering is as follows: .GRAF Canonical Ordering for Partitioning. .BEGIN INDENT 6,6 a. Partition in order of increasing number of superatompots. b. For each entry in each part of (a), partition in order of decreasing size of superatompot by allocation of atoms one at a time to the remaining pot. c. Each individual partition containing two or more superatompots must be in order of equal or decreasing size of the superatompot. In other words, the number of atoms and unsaturations in superatompot n+1 must be equal to or less than the number in superatompart n. The program notes the equality of superatompots in a partition to avoid repetition. .END The application of rules I-V is best illustrated through reference to the example of C↓6U↓3. The maximum number of superatompots for this example is three (Equation 3). There is one way to partition C↓6U↓3 into one superatompot with no remaining pot, partition 1, Table II. Subsequent assignment of carbon atoms one at a time to the remaining pot results in partitions 2-4, Table II. The next partition following the sequence 1-4 would be C↓2U↓3 with C↓4 assigned to the remaining pot. This partition is forbidden as C↓2U↓3 has no free valences. The three ways to partition C↓6U↓3 into two superatompots are indicated along with the corresponding partitions following assignment of atoms to the remaining pot, as partitions 5-10, Table II. There is only one unique way of partitioning C↓6U↓3 into three superatompots, partition 11, Table II. .GRAF Calculation of Free Valence. The expression for the free valence of a superatompot is given by equation 4. .BEGIN NOFILL FV = (2 +↑n&↓[i=3]&∃(i-2)a↓i)-2U→(4) U = unsaturation of superatompot i = valence n = maximum valence in composition a↓i= number of atoms with valence i FV = free valence .END .GRAF Partitioning of Free Valence. Because ring-superatoms are two-connected structures two valences of each atom of a superatompot must be used to connect the atom to the ring-superatom. Thus no free valences can be assigned to bivalent nodes in the valence list, a maximum of one to each trivalent, a maximum of two to each tetravalent, and so forth. The example (Fig. 2) is further simplified in that there are only tetravalent nodes in the valence list. Inclusion of trivalent nodes (e.g., nitrogen atoms) merely extends the number of possible partitions. The free valences are partitioned among the tetravalent nodes in all ways, as illustrated in Figure 2. It is important to note that removal of atom names makes all n-valent (n=2 or 3 or ...) nodes in the valence list equivalent at this stage. Thus the partitions (of eight free valences among six tetravalent nodes) 222200, 222020, 222002, ......, 002222 are all equivalent. Only one of these partitions is considered to avoid eventual duplication of structures. .GRAF Calculation of Loops. There are several rules which must be followed in consideration of loop assignment to ring-superatoms. The minimum (MINLOOPS) and maximum (MAXLOOPS) numbers of loops for a given valence list are designated by equations 5 and 6. .BEGIN NOFILL MINLOOPS = max{0 ,a↓2 + ↑1&↓2&-(2j↓[max]-↑n&↓[i=2]&∃ja↓j)}→(5) MAXLOOPS = min{a↓2, ↑n&↓[j=4]&∃(↑[j-2]&↓[ 2 ]&[---]a↓j}→(6) MINLOOPS = minimum number of loops. MAXLOOPS = maximum number of loops. a↓2 = number of secondary nodes in degree list. j↓[max] = degree of highest degree item in degree list. j = degree. n = highest degree in list. a↓j = number of nodes with degree j. .END The form of the equations results from the following considerations: .BEGIN INDENT 6,6 1) Only secondary nodes may be assigned to loops. Nodes of higher degree will always be in the non-loop portion of the ring-superatom. 2) A loop, by definition, must be attached by two bonds to a single node in the resulting ring-superatom. The loop cannot be attached through the free valences. Thus the degree list must possess a sufficient number of quaternary or higher degree nodes to support the loop(s). 3) Each loop must have at least one secondary node, which is the reason MAXLOOPS is restricted to be at most the number of secondary nodes in the degree list (Equation 6). 4) There must be available one unsaturation for each loop (this is implicit in the calculation of MINLOOPS and MAXLOOPS) as each loop effectively forms a new ring. .END .GRAF Partitioning of Secondary Nodes among Loops/Non-Loops. For each of the possible numbers of loops (0,1, ...) the secondary nodes are removed from the degree list and partitioned among the loops, remembering that the loops are at present indistinguishable and each loop must receive at least one secondary node. In the example (Fig. 2), starting with the degree list (4,0,2), there are three ways of partitioning the four secondary nodes among two loops and the remaining non-looped portion. Removal of the four secondary nodes from the degree list and assignment of two, three or four of them to two loops results in the list specified in Figure 2 as the "reduced degree list". Specification of two loops transforms the two quaternary nodes in the degree list into two secondary nodes. This results from the fact that two valences of a quaternary or higher degree node must be used to support each loop. These are "special" secondary (or higher, for atoms with valence > 4) nodes, however, as these particular nodes will have loops attached as the structure is built up. Thus, in the example, any secondary nodes which are found in the reduced degree list will have a loop attached in a subsequent step. The degree list (4,0,2) thus becomes the reduced degree list (2,0,0) in the partition specifying two loops (Fig. 2). Similarly, the partition of one loop for the degree list (3,2,1) results in a reduced degree list of (1,2,0) with the three original secondary nodes partitioned among loop and non-loop portions (Figure 2). If, after the first, second, ... nth loop partition, there remain one or more quaternary or higher degree nodes in the reduced degree list, the list must be tested again for the possibility of additional loops. Each loop partition will result in an additional set of structures. The second pass will yield those structures possessing loops on loops, and so forth. One such superatom which would be generated in this manner from a composition of (at least) C↓6U↓5 is 15. .BEGIN NOFILL C=C=C=C=C=C α15β .END .GRAF Partitioning of Non-Loop Secondary Nodes Among Edges. The secondary nodes which were not assigned to loops ("non-looped secondary nodes") are partitioned among the edges of the graphs after labelling with loops. Loops are not counted as edges. There are, for example, five ways to partition four non-loop secondary nodes among the edges of the vertex-graph possessing two quaternary nodes (Fig. 2). .GRAF Partitioning of Loop Secondary Nodes among Loops. This partitioning step is carried out assuming indistinguisability of the loops. Each loop must recieve at least one secondary node, which limits the number of possible partitions. Results are presented in Figure 2. .NEXT PAGE .HD Appendix D - Acyclic generator A method of construction of structures similar to the generation of acyclic molecules is utilized to join multiple ring-superatoms and remaining atoms. The DENDRAL algorithm for construction of acyclic molecules (3,10a,24) .SEND FOOT ⊂ .BEGIN SPACE1; INDENT 0,0; 24) See B. G. Buchanan, A. M. Duffield, and A. V. Robertson, in "Mass spectrometry, Techniques and Applications," G. W. A. Milne, ed., John Wiley and Sons, Inc., 1971, p. 121. .END; .⊃; relied on the existence of a unique central atom (or bond) to every molecule. The present acyclic generator uses the same idea. The present algorithm, though simpler in not having to to treat interconnection of atoms or ring-superatoms through multiple bonds, is more complex because of the necessity to deal with the symmetries of the ring-superatoms. .HD D1. Method for the case with even number of total atoms. The superatom partition C↓2U↓2/C↓2U↓1/-/C↓2 (partition 7, Table II and Figure 2) will be used here to illustrate this procedure. The superatompots C↓2U↓2 and C↓2U↓1 have exactly one possible ring-superatom for each (see Table VII). .BEGIN TABLE VII Superatompot Superatom C↓2U↓2 -C≡≡C- C↓2U↓1 >C==C< -------------------------------------------------------------------- .END Thus acyclic structures are to be built with -C≡≡C- , >C==C< and two C's. There are an even number of atoms and ring-superatoms. The structures to be generated fall into two categories: (a) those with bond centroid; (b) those with an atom centroid. .HD |Category A. BOND CENTROID (see Fig. 3)|; .HD |Step 1. Partition into Two Parts.|; The atoms and ring-superatoms in the list of superatoms are partitioned into two parts, with each part having exactly half the total number of items. Each atom or ring-superatom is a single item. Each part has to satisfy equation 7, called the Restriction on Univalents. .BEGIN NOFILL αRestriction on Univalents:β 1+↑n&↓[i=1]&∃(i-2) a↓i≥ 0→(7) i = valence. a↓i = number of atoms or superatoms of valence i. n = maximum valence in composition. .END There are two ways of partitioning the four items into two parts (Fig. 3). The restriction on univalents is satisfied in each case. The restriction will disallow certain partitions that have "too many" univalents other than hydrogens and therefore is essential only in partitioning compositions that contain any number of non-hydrogen univalents. .HD |Step 2. Construct Radicals from Each Part.| Using a procedure described in Section D3, radicals are constructed from each part in each partition. The result of application of this procedure to the example as shown in Table VIII. .BEGIN TABLE VIII, Radicals Constructable from Given Parts Part | Radicals ---------------------+------------------------------ -C≡≡C- , >C==C< -> -C≡≡C-CH=CH2 -CH=CH-C≡≡CH -C-C≡CH " CH2 ---------------------+------------------------------ C2 -> -CH2-CH3 ---------------------+------------------------------ -C≡≡C- , C -> -C≡≡C-CH3 -CH2-C≡≡CH ---------------------+------------------------------ >C==C< , C -> -CH==CH-CH3 -C-CH3 " CH2 -CH2-CH==CH2 -------------------------------------------------------------------- .END .HD |Step 3. Form Molecules From Radicals.| The radicals are combined in unique pairs, within each initial partition. Each pair gives rise to a unique molecule, for each of which the centroid is a bond. There are nine such molecules for the example chosen (Fig. 3). .HD |Category B. ATOM CENTROID (see Fig. 4).| .HD |Step 1. Selection of Centroid.| One must consider every unique atom or ring-superatom that has a free valence of three or higher as an atom centroid. In the example, of three candidates available: -C≡≡C- , >C==C< and C, the first is not chosen for it has a free valence of only two. .HD |Step 2. Partition the Rest of the Atoms.| The atom or ring-superatom chosen for the center is removed from the set and the rest are partitioned into a number of parts less than or equal to the valence of the central atom. Each part must have less than half the total number of items being partitioned (again a ring-superatom is a single item). Each part must satisfy the restriction on univalents (equation 7). Thus, for the case where a carbon is the center, from one to four partitions are generated with the condition that each part has at most 1 atom (the number of atoms is ≤(4-1)/2). No valid partitions are possible for one, two or four parts. There is exactly one partition for three parts, i.e., one in each. The partitions are shown in Figure 4. .HD |Step 3. Construct Radicals.| Once again, using the procedure described in Section D3, radicals are constructed for each part in each partition. For example, the partition -C≡≡C- gives rise to exactly one possible radical -C≡≡CH (Fig. 4). .HD |Step 4. Combine Radicals.| Although in the example shown every part generates only one radical, in the general case there will be many radicals for each part. If so, the radicals must be combined to give all unique combinations of radicals within each part. .HD |Step 5. Form Molecules from Central Atom and Radicals.| If the central atom is not a ring-superatom but is a simple atom, then each combination of radicals derived in Step 4 defines a single molecule that is unique. Thus for example when C is chosen as the centroid, step 4 gives one combination of radicals which determines a single molecule when connected to the central C (see Figure 4). If the central atom is a ring-superatom and the valences of the ring-superatom are not identical then different ways of distributing the radicals around the center may yield different molecules. Labelling of the free valences of the central ring-superatom with radicals treated as labels (supplemented with adequate number of hydrogens to make up the total free valence of the ring-superatom) generates a complete and irredundant list of molecules. Thus >C==C< is labelled with the label set: .BEGIN NOFILL one of -C≡≡CH , two of -CH3 , and one of -H. .END There are two unique labellings as shown in Figure 4. .HD |D2. Method for odd number of total atoms.| With an odd number of total atoms, no structures can be generated with a bond centroid. Only atom centroid is possible. However, it is possible for structures to be built with a bivalent atom at the centroid. Thus the procedure outlined in Category B above is followed, in this case also allowing a bivalent atom as the centroid. .HD |D3. Construction of Radicals| The goal of this procedure is to generate all radicals from a list of atoms and ring-superatoms. A radical is defined to be an atom or superatom with a single free valence. When a composition of atoms and ring-superatoms is presented, from which radicals are to be constructed, two special cases are recognized. .HD |Special Case 1. Only One Atom in Composition.| When only one atom which is not a ring-superatom is in the list, only one radical is possible. For example, with one C, the radical -CH3 is the only possibility. .HD |Case 2. Only One Ring-superatom in Composition.| In this case, depending upon the symmetry of the ring-superatom, several radicals may be possible. This is determined by labelling the free valences of the ring-superatom with one label of a special type, a "radical-valence". .BEGIN NOFILL GROUP Example: A composition consists of one ring-superatom, α16.β C- /" >C< " \" C- α16β .END Two radicals result from labelling with one radical valence. .BEGIN NOFILL GROUP CH C- /" /" -CH< " CH2< " \" \" CH CH α17 18β .END .HD |GENERAL CASE| Radicals have uniquely defined centroids as well(10a,24). The centroid is always an atom of valence two or higher. The steps for construction of radicals are as follows. .HD |Step 1. Selection of Atom Centroid.| Any bivalent or higher valent atom or ring-superatom is a valid candidate to be the centroid of a radical. Thus, for example, for the composition -C≡C-, >C=C< (see part 1 in Figure 3) both are valid centroids (Figure 5). .HD |Step 2. Partition the Rest of the Atoms.| The atom chosen for the centroid is removed from the composition. One of the valences of the centroid is to remain free. Therefore, the rest of the atoms in the composition are partitioned into less than or equal to (valence of centroid - 1) parts. Of course, each part should satisfy the restriction on univalents (equation 7) but for constructing radicals there is no restriction on the size of the parts. .HD |Step 3. Form Radicals from Each Part.| The procedure to construct radicals is freshly invoked on each part thus constructing radicals. Each part in Figure 5 gives rise to only one radical, each arising from special case 2. .HD |Step 4. Combine Radicals in Each Part.| For the example in Figure 5, each part yields only one radical. In a more general situation, where the rest of the composition after selection of a centroid, is partitioned into several parts, and where each part yields several radicals, the radicals are combined to determine all unique combinations of radicals. .HD |Step 5. Label Central Atom with Radicals.| If the center is an atom (not a ring-superatom) then each unique combination defines a single unique radical. If the center is a ring-superatom, the radicals are determined by labelling the center with a set of labels which includes;I) the radicals; ii) a leading radical-valence; iii) an adequate number of hydrogens to make up the remaining free valences of the ring-superatom. One selection of center gives one radical and the other gives two more, to complete a list of three radicals for the example chosen (Fig. 5). .HD Summary For the example chosen to illustrate the operation of the acyclic generator, twelve isomers are generated, nine shown in Figure 3 and three shown in Figure 4.