Time-stamp: <2012-02-03 23:37:17 root>
(CET)

Originally posted 2008-10-20.
Copyright (c) 2008-2011 Craig Latta. All rights reserved.


Hi--

     I've been on a quest to make Squeak smaller and more modular, the
Spoon project[1]. Part one was making the object memory small. Part
three is about making the virtual machine small. This message is about
part two, making a module system suitable for adding new behavior to a
minimal system in an organized way, and for transferring behavior
accurately between running systems.

     Spoon's module system is called "Naiad", which is an acronym for
"Name And Identity Are Distinct". It keeps track of the development
history of a system (what the "sources" and "changes" files are for
now), and makes it available for exchange with other systems. I think
keeping classes' names and identities separate is critical for
this. Following are some notes on its design and use, including the
object model[3].

     At this point I'd like to emphasize I am the author of this
design, that I intend to release its implementation under an MIT-style
license, and that I'd like to pursue a graduate degree with it.

***

motivation

     A traditional Smalltalk system uses source code to express both
development history and changes exchanged between systems. The precise
meaning of source code depends on the current state of the system
compiling it, and the current state of the system that will run
it. Keeping these systems separate helps, but since they are both
dynamic, source code is an inherently ambiguous medium across time.

     The most problematic system artifacts in light of this ambiguity
are classes. All activity in a Smalltalk system is the result of
sending messages to objects. The sending of a message invokes the
execution of a method, a sequence of instructions for a virtual
processor. Some of these instructions manipulate the state of the
object receiving the message. Classes define the structure of that
state. Therefore, when those class definitions change, the source code
for the methods of those classes may become meaningless.

     One may confront this situation when trying to recompile source
code for an old version of a method whose class definition has changed
in the meantime. Similarly, source code from one system may not be
meaningful on another, since corresponding class definitions on each
system may change independently (or be removed entirely).

     This means that the accurate exchange of behavior requires manual
labor, hindering the propagation of useful fixes and new code. It also
means that interpretation and use of historical code is more difficult
than necessary. So we pay twice for this problem: when learning the
system, and when trying to share our work with others. By separating
class name from identity, Naiad makes Smalltalk more approachable for
newcomers, and more productive for developer and user communities.


editions


     Using Naiad, each development system consists of two object
memories: one containing developed code, and another containing
"editions" which describe that code. I'll call the first one the
"subject memory" and the other the "history memory".

     An Edition is a description of some artifact in the subject
memory at some point in time, currently an author, comment, tag,
class, method, module, checkpoint, or edit. Each edition has a
reference to that artifact's next state in the future (the next
edition) and in the past (the previous edition), as well as an author
edition, a collection of licenses, and a timestamp.

     An Edit represents the activation of some edition at a point in
time. For example, there may be a method created in 2005 that is
removed in 2006 and reactivated in 2007. There would be an Edit for
each of those three events, but only two method editions (one
representing the method becoming active, and one representing it being
removed).

     The history memory replaces the current changes and sources
files. It has an instance of EditHistory corresponding to the subject
memory, which records the active (current) editions for the classes,
method, modules, and authors in the subject memory. It also keeps the
subject memory's id and the last Edit made to the subject memory.

     Every time the subject memory adds, changes, or removes a class
definition, method, author, comment, tag, or module, or makes a
checkpoint (i.e., makes an edit), it adds the appropriate editions to
the history memory via remote messages. The history memory snapshots
itself after every edit, so as to provide crash recovery support.

     The subject memory keeps a remote reference to the history
memory's instance of EditHistory as a class variable of the local
EditHistory class, and interacts with it using utility messages sent
to the local EditHistory class. The history memory also keeps that
EditHistory instance as a class variable of its local EditHistory
class, but as a local reference.

     An edition typically elides some of its references when it is
transferred out of a history memory. For example, a transferred
edition will usually omit the references to its next and previous
editions. The requesting subject memory can calculate the ID of those
editions and obtain them with a separate request, if necessary.

     A subject memory may elect to keep its EditHistory instance as a
local object, such as in a situation where one wants some limited
immutable history for debugging purposes, and no crash recovery
support. Whether in this scenario or in normal development the same
EditHistory utility messages suffice, since no special code need be
written to support remote objects. If no edits will be made during
deployment, and no history retrieval is required, one may simply
jettison the history memory. One may always reconnect the subject and
history memories at a later time and continue development.

     The subject memory has tools for browsing and activating the
editions, wherever they are located. This means that no special tools
are needed to browse the artifacts of multiple subject systems; one
uses the same tools as for browsing the artifacts of the local subject
memory. Each subject memory may connect to multiple history memories
concurrently (if allowed).

     For that matter, the history memories of multiple systems may
connect to each other directly, to aggregate editions from multiple
people, for example.


class and method IDs


     Each class in the subject memory has a universally-unique
identifier[2], or UUID. The classes in the minimal subject memory are
assigned UUIDs before the initial release, and all subsequent classes
are assigned UUIDs when created. Rather than use the single word
"class" to refer to either a metaclass or to its sole instance, Spoon
introduces the term "protoclass". For example, (Array class) is a
metaclass, and its sole instance, Array, is a protoclass. Each
metaclass and protoclass has its own UUID, called a "base ID". This is
supported by a new instance variable in ClassDescription.

     Each version of each class is identified by a ClassID, a byte
array with segments for the class's baseID, author UUID, and a
sixteen-bit version. This means we can uniquely identify, for each
author, 65,535 versions of each class in the system. Since we identify
authors by UUID, the number of possible authors is very large.

     Each version of each method is identified by a MethodID, a byte
array which contains a ClassID and segments for the method's selector,
author UUID, and a sixteen-bit version. This means we can uniquely
identify, for each author, 65,535 versions of each method in each
version of each class in the system.


method editions and method literal markers


     Each MethodEdition holds a reference to the corresponding
ClassEdition, the method source code, and the information needed to
reconstruct the corresponding CompiledMethod directly, without need of
the compiler (the method header, initial and final program-counter
values, method literal markers, and instructions). If one will never
use the history memory to install methods in a subject memory that
lacks a compiler, one could drop the compiled method information to
save space.

     Method literal markers are used to transmit a compiled method's
literal frame values between object memories. There are method literal
marker classes to support references to classes, class variables,
other pool variables, and literal objects, and to support methods
which perform class-side super-sends. Each method literal marker
instance knows how to serialize itself as part of Spoon's remote
messaging system. In particular, when a method literal that refers to
a class transmits itself, it transmits the ClassID of that class, not
the name of the class.

     This gets at the namesake concept of Naiad, "Name And Identity
Are Distinct". We can transfer methods between systems directly using
method editions in remote messages; textual source code is
optional. When referring to a class in a method edition, we never need
to use its textual name. Each version of each class is an object with
a distinct identity. By using ClassIDs to refer to each of them, we
can avoid using class names at all when storing history or
distributing code. This means that the textual name of each class can
be anything, as far as the system is concerned.

     With every class name unconstrained, there is no need for class
"namespaces" to distinguish between classes which happen to have same
name at some point in time. Each class effectively has its own
namespace, since it is uniquely identifiable regardless of its name.

     Developer tools armed with this information can resolve ambiguity
for humans browsing and changing the system. If a developer writes a
method which uses a name shared by multiple classes, the system can
present more information about each of those classes (such as the
author, time of creation, version, and module association), so that
the developer can choose the intended one. When browsing such a
method, the system can distinguish the aliased class name visually,
indicating that there is disambiguating information available.


class editions and shared variables


     Each ClassEdition holds the editions for all the method versions
currently active in the corresponding class in the subject
memory. Since every edition keeps a reference to its previous and next
editions, one can trace the history of any method by starting at the
active edition. Removed methods are represented by method editions
which have the same MethodID as a normal previous method edition, but
with the rest of the fields set to nil.

     Each ClassEdition also holds the information needed to
reconstruct the corresponding class directly, without need of the
class builder. For all classes, this includes the format, instance
variable names, and superclass ID. For protoclasses, it also includes
the class pool keys, class name, and received pool IDs.

     In Spoon, every shared variable pool is the responsibility of
some class in the system. There is no global variables pool ("system
dictionary"). Each class that defines a pool is said to "publish" that
pool; classes which use that pool "receive" it. Spoon adds an instance
variable to Class to map published pools to their names. Each
ReceivedPoolID that a protoclass edition uses is a byte array which
contains a class ID and a published pool name.


checkpoints and modules


     A Checkpoint edition is simply a named marker of a particular
point in time. A developer may use checkpoints to indicate various
interesting states of development, and use the tools to regress or
replay edits made before or after that time.

     The largest unit of work is represented by module editions. They
are named collections of method IDs, indicating the specific versions
of methods which comprise a module, along with sets of child, parent,
prerequisite, and postrequisite module editions. When a module edition
is transferred out of a history memory, those edition references are
transmitted as ModuleIDs. Each module edition also has an
"antimodule", a module edition calculated at installation time by a
receiving system which, if applied, would undo the changes made by
installing the original module. Finally, each module edition has a URI
by which someone at a remote site may install the module.

     That URI represents a command to a Spoon system running on a
requestor's local machine; it refers to a standard port on
localhost. Its path is a text-encoded action, containing an
instruction (in this case "install a module"), the hostname and port
of a Spoon system providing the module, and the module's ID. The
receiving system uses this information to request the module from a
providing history memory, which then transmits editions as
necessary. Exactly which editions are transmitted depends on the state
of the receiving system; this is a two-way conversation between the
providing and receiving systems. This is often more time and space
efficient than simply providing all of a module's code, which is what
happens with traditional static representations like change sets.

     The URIs may be cited on ordinary webpages, which are indexed by
search engines like Google. A person in search of a module for a
particular purpose can search for it with a web browser, using those
search engines. Having found a module's URI, the person can click on
it, establishing a connection to an embedded webserver in their local
Spoon system, which carries out the URI's command.

     This mechanism for code distribution avoids storing code in
static files. It's a deparature from Smalltalk's traditional "fileout"
mechanism.

     The encoded URIs can serve other functions as well, such as
listing a system's installed modules, removing an installed module,
making a snapshot, and quitting the system. In this way one can use a
web browser to interact with a Spoon system for several basic tasks;
this is especially useful when the system is headless (e.g., in its
initial minimal state).


comments and tags


     Editions for authors, classes, methods, checkpoints, edits, and
modules each have their own comment and tag editions. This means each
one of those artifacts has a comment and tags, and the changes in both
are recorded over time. Comments are as we've already been using them:
they're explanatory prose about the artifacts. Tags may be familiar to
you from the web; they are short semantic markers used for grouping
similar artifacts.

     I intend for tags to replace class and method
categories. Nominally, we've been using class and method categories to
establish semantic hierarchies, but the hierarchies have turned out to
be quite shallow. Although we can form hierarchies with tags as well,
I think we would do better to apply the sorts of algorithms that
search engines use, and not concern ourselves with memorizing an
artifact's semantic markers. The computational cost this incurs for
the tools might have been high in the early days of Smalltalk, but it
is quite modest now.


     Thanks for reading! Please let me know of any questions or other
feedback, and feel free to discuss this on the Spoon and Squeak-dev
mailing lists.


-C

[1] http://netjam.org/spoon
[2] http://en.wikipedia.org/wiki/Universally_Unique_Identifier
[3]

     Here is a list of the key changes and additions that Naiad makes
to the Smalltalk class hierarchy. Class names are indented to show
inheritence relationships. Ellipses indicate that some existing
superclasses have been elided from the list (but not removed from the
system). Instance variables are listed to the right in parentheses.

Object                                             ()

     ...ByteArray                                  ()

          ClassID                                  (class UUID in bytes 1-16
                                                    class author UUID in bytes 17-32
                                                    class version in bytes 33-34)

               MethodID                            (method author UUID in bytes 35-50
                                                    method version in bytes 51-52
                                                    method selector in bytes 53 to end)

               ReceivedPoolID                      (pool name in bytes 35 to end)

          ModuleID                                 (module ID in bytes 1-16
                                                    author ID in bytes 17-32
                                                    version in bytes 33-34)

          UUID                                     (16 bytes, see the Leach/Salz UUID spec)

     ...ClassDescription                           (baseID
                                                    instanceVariables)

          Class                                    (classVariablesPool
                                                    name
                                                    publishedPools
                                                    receivedPools
                                                    subclasses
                                                    tags)

     EditHistory                                   (activeAuthorEditions
                                                    activeCheckpoint
                                                    activeClassEditions
                                                    activeModuleEditions
                                                    currentAuthor
                                                    id
                                                    lastEdit)

     Edition                                       (author
                                                    licenses
                                                    nextEdition
                                                    previousEdition
                                                    timestamp)

          CommentEdition                           (comment
                                                    commentedEdition)

          CommentedEdition                         (activeCommentEdition)

              TaggedEdition                        (activeTagsEdition id)

                    AuthorEdition                  (died
                                                    emailAddress
                                                    name
                                                    website)

                    BehavioralEdition              ()

                         ClassEdition              (activeMethodEditions
                                                    format
                                                    instanceVariableNames
                                                    superclassID)

                              MetaclassEdition     (protoclassEditions)

                              ProtoclassEdition    (classPoolKeys
                                                    name
                                                    receivedPoolIDs)

                         MethodEdition             (classEdition
                                                    endPC
                                                    header
                                                    initialPC
                                                    instructions
                                                    literalMarkers
                                                    source)

                    Checkpoint                     (name)

                    Edit                           (edition)

                    ModuleEdition                  (antimodule
                                                    children
                                                    installationURI
                                                    methodIDs
                                                    name
                                                    parents
                                                    postrequisites
                                                    prerequisites)

               TagsEdition                         (tags
                                                    taggedEdition)

     LiteralMarker                                 ()

          BehavioralLiteralMarker                  (class)

               ClassLiteralMarker                  ()

               MetaSuperSendLiteralMarker          ()

               SharedVariableLiteralMarker         (key)

                    ClassVariableLiteralMarker     ()

                    PublishedVariableLiteralMarker ()

          IdentityLiteralMarker                    (literal)

--
Craig Latta
improvisational musical informaticist
www.netjam.org
Smalltalkers do: [:it | All with: Class, (And love: it)]