10. Meta Data – Taxonomy.

Taxonomy is the classification of the SAIL objects by DART metadata, by SAIL file system metadata, by content type (Magic type or Mime type) and by what are now called page attributes. Simply stated the data objects tend to be either text or binary, public or private. Each object has a file name, several date-time stamps, as well as a programmer code and a project code.

SAIL file system metadata

Filename, date time written, file protection, file mode and the file length in PDP-10 words.

DART tape record metadata

DART metadata from the MCOPY run Low density tape# and record# High density tape# and record# Latter day DART segmentation of the MCOPY Byte offset in the 1998 DART byte vector : 0xAAABBBCCC dart segments ( low density records + 63 gaps + 43 short skips )

Text verses Binary

The major text editor was named “E”. Text files generated using “E” always have an ASCII page table at the front of the file embedded as a comment, and so look like this:


        COMMENT VALID 00002 PAGES
        C REC PAGE DESCRIPTION
        C00001 00001
        C00002 00002
        C00003 ENDMK
        C;
  

Role of the Digital Curator

SailDart cooked metadata

Pub = True or False.

Here ’Pub’ means that the SailDart web server will serve the item. This is a manual classification based on latter day SailDart policy implemented to determine which files belonged to which private individuals based on date spans and the PPN codes. Files that are inappropriate to distribute on the internet in 2014 are marked Pub=False. However many files are marked Pub=True because they were very visible during the period 1972 to 1990 both at terminals at Stanford facilities as well as via telephone modem and the nascent ARPA network which became the TCP/IP internet; or because the known author has been contacted and has released the files as Pub=True for unrestricted web serving. Finally, I wish to note that Pub=False is a rather weak security classification, like the US government security level FOUO (For Official Use Only) in that the material is available as a collection for academic archival study. There exist a handful of time capsule copies distributed around the world which are not encrypted.

Redacted = True or False.

Very few items are on a manual redact list. Hundreds of thousands of dart entries are redacted because they are exact duplicates in name, date, owner, protection and content. SAIL files were intentionally written twice as DART policy for the permanent MCOPY tapes.

Type. Text verses Binary.

The SAIL computer system encoded text in a non-standard 7-bit character code similar to ASCII. Furthermore the SAIL custom keyboards had extra shift modes (named META and TOP) as well as two special interrupt keys named CALL and BREAK. And ALT was a character not a shift mode. End of line was marked by <Carriage Return><Line Feed>. There are even files with null padding between the <CR> and the <LF> so that a mechanical teletype head would have enough time to "return-the-carriage" from the right to the left side of a page. While first converting the heap of SAIL files it initially seemed obvious that they could be split into Text files and Binary files. This is ultimately misleading since computer software has both source text files and binary executable files. Likewise a piece of music has a source text and a binary representation. And the academic papers and technical documentation are written in mark up language source files (e.g. PUB and TeX) that have to be compiled to generate printed copy.

Copyright Status.