This site is supported by donations to The OEIS Foundation.

Theme One Program • Blog

From OeisWiki
Jump to: navigation, search

Author: Jon Awbrey



Blog Hub

Theme One Program

Introduction

Exposition 1

Theme One is a program for building and transforming a particular species of graph-theoretic data structures, forms designed to support a variety of fundamental learning and reasoning tasks.

The program evolved over the course of an exploration into the integration of contrasting types of activities involved in learning and reasoning, especially the types of algorithms and data structures capable of supporting all sorts of inquiry processes, from everyday problem solving to scientific investigation.  In its current state, Theme One integrates over a common data structure fundamental algorithms for one type of inductive learning and one type of deductive reasoning.

We begin by describing the class of graph-theoretic data structures used by the program, as determined by their local and global features.  It will be the usual practice to shift around and view these graphs at many different levels of detail, from their abstract definition to their concrete implementation, and many points in between.

The main work of the Theme One program is achieved by building and transforming a single species of graph-theoretic data structures.  In their abstract form these structures are closely related to the graphs called cacti and conifers in graph theory, so we'll generally refer to them under those names.

The Idea-Form Flag

The graph-theoretic data structures used by the program are built up from a basic data structure called an idea-form flag.  That structure is defined as a pair of Pascal data types by means of the following specifications.

Theme Exposition Type Idea = ^Form.png
  • An idea is a pointer to a form.
  • A form is a record consisting of:
    • A sign of type char;
    • Four pointers, as, up, on, by, of type idea;
    • A code of type numb, that is, an integer in [0, max integer].

Represented in terms of digraphs, or directed graphs, the combination of an idea pointer and a form record is most easily pictured as an arc, or directed edge, leading to a node labeled with the other data, in this case, a letter and a number.

At the roughest but quickest level of detail, an idea of a form can be drawn as follows.

Theme Exposition Idea^Form Node.png

When it is necessary to fill in more detail, the following schematic pattern can be used.

Theme Exposition Idea^Form Flag.png

The idea-form type definition determines the local structure of the whole host of graphs used by the program, including a motley array of ephemeral buffers, temporary scratch lists, and other graph-theoretic data structures used for their transient utilities at specific points in the program.

I will put off discussing the more incidental graph structures until the points where they actually arise, focusing here on the particular varieties of cactoid graphs which constitute the main formal media of the program's operation.

Exposition 2

The previous post described the elementary data structure used to represent nodes of graphs in the Theme One program.  This post describes the specific family of graphs employed by the program.

Painted And Rooted Cacti

Figure 1 shows a typical example of a painted and rooted cactus.

Theme Exposition Painted And Rooted Cactus Display.png

The graph itself is a mathematical object and does not inhabit the page or other medium before our eyes, and it must not be confused with any picture or other representation of it, anymore than we'd want someone to confuse us with a picture of ourselves, but it's a fair enough picture, once we understand the conventions of representation involved.

Let be the set of nodes in a graph and let be a set of identifiers.  We often find ourselves in situations where we have to consider many different ways of associating the nodes of with the identifiers in   Various manners of associating nodes with identifiers have been given conventional names by different schools of graph theorists.  I will give one way of describing a few of the most common patterns of association.

  • A graph is painted if there is a relation between its node set and a set of identifiers, in which case the relation is called a painting and the identifiers are called paints.
  • A graph is colored if there is a function from its node set to a set of identifiers, in which case the function is called a coloring and the identifiers are called colors.
  • A graph is labeled if there is a one-to-one mapping between its node set and a set of identifiers, in which case the mapping is called a labeling and the identifiers are called labels.
  • A graph is said to be rooted if it has a unique distinguished node, in which case the distinguished node is called the root of the graph.  The graph in Figure 1 has a root node marked by the “at” sign or amphora symbol “@”.

The graph in Figure 1 has eight nodes plus the five paints in the set   The painting of nodes is indicated by drawing the paints of each node next to the node they paint.  Observe that some nodes may be painted with an empty set of paints.

The structure of a painted and rooted cactus can be encoded in the form of a character string called a painted and rooted cactus expression.  For the remainder of this discussion the terms cactus and cactus expression will be used to mean the painted and rooted varieties.  A cactus expression is formed on an alphabet consisting of the relevant set of identifiers, the paints, together with three punctuation marks:  the left parenthesis, the comma, and the right parenthesis.

Exposition 3

Coding Logical Graphs

My earliest experiments coding logical graphs as dynamic “pointer” data structures taught me that conceptual and computational efficiencies of a critical sort could be achieved by generalizing their abstract graphs from trees to the variety graph theorists know as cacti.  The genesis of that generalization is a tale worth telling another time, but for now it's best to jump right in and proceed by way of generic examples.

Figure 1 shows a typical example of a painted and rooted cactus.

Theme Exposition Painted And Rooted Cactus Display.png

Figure 2 shows a way to visualize the correspondence between cactus graphs and cactus strings, demonstrated on the cactus from Figure 1.  By way of convenient terminology, the polygons of a cactus graph are called its lobes.  An edge not part of a larger polygon is called a 2‑gon or a bi‑gon.  A terminal bi‑gon is called a spike.

Theme Exposition Cactus Graph and Cactus Expression.png

The correspondence between a cactus graph and a cactus string is obtained by an operation called traversing the graph in question.

  • One traverses a cactus graph by beginning at the left hand side of the root node, reading off the list of paints one encounters at that point.  Since the order of elements at any node is not significant, one may start the cactus string with that list of paints or save them for the end.  We have done the latter in this case.
  • One continues by climbing up the left hand side of the leftmost lobe, marking the ascent by means of a left parenthesis, traversing whatever cactus one happens to reach at the first node above the root, that done, proceeding from left to right along the top side of the lobe, marking each interlobal span by means of a comma, traversing each cactus in turn one meets along the way, on completing the last of them climbing down the right hand side of the lobe, marking the descent by means of a right parenthesis, and then traversing each cactus in turn, in left to right order, that is incident with the root node.

The string of letters, parentheses, and commas one obtains by this procedure is called the traversal string of the graph, in this case, a cactus string.

Exposition 4

Parsing Logical Graphs

It is possible to write a program that parses cactus expressions into reasonable facsimiles of cactus graphs as pointer structures in computer memory, making edges correspond to addresses and nodes correspond to records.  I did just that in the early forerunners of the present program, but it turned out to be a more robust strategy in the long run, despite the need for additional nodes at the outset, to implement a more articulate but more indirect parsing algorithm, one in which the punctuation marks are not just tacitly converted to addresses in passing, but instead recorded as nodes in roughly the same way as the ordinary identifiers, or paints.

Figure 3 illustrates the type of parsing paradigm used by the program, showing the pointer graph structure obtained by parsing the cactus expression in Figure 2.  A traversal of this graph naturally reconstructs the cactus string that parses into it.

Theme Exposition Parse Graph and Traverse String.png

The pointer graph in Figure 3, namely, the parse graph of a cactus expression, is the sort of thing we'll probably not be able to resist calling a cactus graph, for all the looseness of that manner of speaking, but we should keep in mind its level of abstraction lies a step further in the direction of a concrete implementation than the last thing we called by that name.  While we have them before our mind's eyes, there are several other distinctive features of cactus parse graphs we ought to notice before moving on.

In terms of idea-form structures, a cactus parse graph begins with a root idea pointing into a by‑cycle of forms, each of whose sign fields bears either a paint, in other words, a direct or indirect identifier reference, or an opening left parenthesis indicating a link to a subtended lobe of the cactus.

A lobe springs from a form whose sign field bears a left parenthesis.  That stem form has an on idea pointing into a by‑cycle of forms, exactly one of which has a sign field bearing a right parenthesis.  That last form has an on idea pointing back to the form bearing the initial left parenthesis.

In the case of a lobe, aside from the single form bearing the closing right parenthesis, the by‑cycle of a lobe may list any number of forms, each of whose sign fields bears either a comma, a paint, or an opening left parenthesis signifying a link to a more deeply subtended lobe.

Just to draw out one of the implications of this characterization and to stress the point of it, the root node can be painted and bear many lobes, but it cannot be segmented, that is, the by‑cycle corresponding to the root node can bear no commas.

Exposition 5

Lexical, Literal, Logical

Theme One puts cactus graphs to work in three distinct but related ways, called lexical, literal, and logical applications.  The three modes of operation employ three distinct but overlapping subsets of the broader species of cacti.  Accordingly we find ourselves working with graphs, expressions, and files of lexical, literal, and logical types, depending on the task at hand.

The logical class of cacti is the broadest, encompassing the whole species described above, of which we have already seen a typical example in its several avatars as abstract graph, pointer data structure, and string of characters suitable for storage in a text file.

Being a logical cactus is not just a matter of syntactic form — it means being subject to meaningful interpretations as a sign of a logical proposition.  To enter the logical arena cactus expressions must express something, a proposition true or false of something.

Fully addressing the logical, interpretive, semantic aspect of cactus graphs normally requires a mind-boggling mass of preliminary work on the details of their syntactic structure.  Practical, pragmatic, and especially computational considerations will eventually make that unavoidable.  For the sake of the present discussion, however, let's put that on hold and fast forward to the logical substance.

Lexical Cacti

Literal Cacti

Logical Cacti

Exposition 6

Quickly recapping the discussion so far, we started with a data structure called an idea‑form flag and adopted it as a building block for constructing a species of graph-theoretic data structures called painted and rooted cacti.  We showed how to code the abstract forms of cacti into character strings called cactus expressions and how to parse the character strings into pointer structures in computer memory.

That brought us to a choice between two expository strategies.

A full account of Theme One’s operation would describe its use of cactus graphs in three distinct ways, called lexical, literal, and logical applications.  The more logical order would approach the lexical and literal tasks first.  That is because the program's formal language learner must first acquire the vocabulary its propositional calculator interprets as logical variables.  The sequential learner operates at two levels, taking in sequences of characters it treats as strings or words plus sequences of words it treats as strands or sentences.

Finding ourselves more strongly attracted to the logical substance, however, we leave the matter of grammar to another time and turn to Theme One’s use of cactus graphs in its reasoning module to represent logical propositions on the order of Peirce's alpha graphs and Spencer Brown's calculus of indications.

Logical Cacti

Up till now we've been working to hammer out a two-edged sword of syntax, honing the syntax of cactus graphs and cactus expressions and turning it to use in taming the syntax of two-level formal languages.  But the purpose of a logical syntax is to support a logical semantics, which means, for starters, to bear interpretation as sentential signs capable of denoting objective propositions about a universe of objects.

One of the difficulties we face is that the words interpretation, meaning, semantics, and their ilk take on so many different meanings from one moment to the next of their use.  A dedicated neologician might be able to think up distinctive names for all the aspects of meaning and all the approaches to them that concern us, but I will do the best I can with the common lot of ambiguous terms, leaving it to context and intelligent interpreters to sort it out as much as possible.

The formal language of cacti is formed at such a high level of abstraction that its graphs bear at least two distinct interpretations as logical propositions.  The two interpretations concerning us here are descended from the ones C.S. Peirce called the entitative and the existential interpretations of his systems of graphical logics.

Existential Interpretation

Table 1 illustrates the existential interpretation of cactus graphs and cactus expressions by providing English translations for a few of the most basic and commonly occurring forms.

Existential Interpretation.png

Entitative Interpretation

Table 2 illustrates the entitative interpretation of cactus graphs and cactus expressions by providing English translations for a few of the most basic and commonly occurring forms.

Entitative Interpretation.png

Exposition 7

The main things to take away from the previous post are the following two ideas, one syntactic and one semantic.

  • The compositional structures of cactus graphs and cactus expressions are constructed from two kinds of connective operations.
  • There are two ways of mapping these compositional structures into the compositional structures of propositional sentences.

The two kinds of connective operations are described as follows.

  • The node connective joins a number of component cacti to a node:
Box Cj Node Connective.jpg
  • The lobe connective joins a number of component cacti to a lobe:
Box Cj Lobe Connective.jpg

The two ways of mapping cactus structures to logical meanings are summarized in Table 3, which compares the entitative and existential interpretations of the basic cactus structures, in effect, the graphical constants and connectives.

Logical Interpretations of Cactus Structures • En Ex.png

Exposition 8

The abstract character of the cactus language relative to its logical interpretations makes it possible to give abstract rules of equivalence for transforming cacti among themselves and partitioning the space of cacti into formal equivalence classes.  The transformation rules and equivalence classes are “purely formal” in the sense of being indifferent to the logical interpretation, entitative or existential, one happens to choose.

Two definitions are useful here:

  • A reduction is an equivalence transformation which applies in the direction of decreasing graphical complexity.
  • A basic reduction is a reduction which applies to a basic connective, either a node connective or a lobe connective.

The two kinds of basic reductions are described as follows.

  • A node reduction is permitted if and only if every component cactus joined to a node itself reduces to a node.
Box Cj Node Reduction.jpg
  • A lobe reduction is permitted if and only if exactly one component cactus listed in a lobe reduces to an edge.
Box Cj Lobe Reduction.jpg

That is roughly the gist of the rules.  More formal definitions can wait for the day when we need to explain their use to a computer.

Examples

Jets and Sharks

Jets and Sharks 1

It is easy to spend a long time on the rudiments of learning and logic before getting down to practical applications — but I think we've circled square one long enough to expand our scope and see what the category of programs envisioned in Theme One can do with more substantial examples and exercises.

During the development of the Theme One program I tested successive implementations of its Reasoning Module or Logical Modeler on appropriate examples of logical problems current in the literature of the day.  The PDP Handbook of McClelland and Rumelhart set one of the wittiest gems ever to whet one's app‑titude so I could hardly help but take it on.  The following text is a light revision of the way I set it up in the program's User Guide.

Example 5. Jets and Sharks

The propositional calculus based on the minimal negation operator can be interpreted in a way resembling the logic of activation states and competition constraints in one class of neural network models.  One way to do this is to interpret the blank or unmarked state as the resting state of a neural pool, the bound or marked state as its activated state, and to represent a mutually inhibitory pool of neurons by the proposition   The manner of representation may be illustrated by transcribing a well-known example from the parallel distributed processing literature (McClelland and Rumelhart 1988) and working through a couple of the associated exercises as translated into logical graphs.

Displayed below is the text expression of a traversal string which Theme One parses into a cactus graph data structure in computer memory.  The cactus graph represents a single logical formula in propositional calculus and this proposition embodies all the logical constraints defining the Jets and Sharks data base.

Theme One Guide • Jets and Sharks • Log File.png
Jets and Sharks 2

Example 5. Jets and Sharks (cont.)

As we saw last time, Theme One reads the text file shown above and constructs a cactus graph data structure in computer memory.  The cactus graph represents a single logical formula in propositional calculus and that proposition embodies the logical constraints defining the Jets and Sharks data base.

Our cactus graph incorporates a vocabulary of 41 logical terms, each of which represents a boolean variable, so the proposition in question, call it is a boolean function of the form   Given we know a truth table for takes over two trillion rows and a venn diagram for takes the same number of cells.  Topping it off, there are boolean functions of the form and is just one of them.

Measures of strategy are clearly needed to negotiate patches of cacti like those.

Jets and Sharks 3

Example 5. Jets and Sharks (cont.)

Given a representation of the Jets and Sharks universe in computer memory, we naturally want to see if the memory serves to supply the facts a well-constructed data base should.

In their PDP Handbook presentation of the Jets and Sharks example, McClelland and Rumelhart suggest several exercises for the reader to explore the performance of their neural pool memory model on the tasks of retrieval and generalization (Exercise 2.1).

Using cactus graphs or minimal negations to implement pools of mutually inhibitory neurons lends itself to neural architectures on a substantially different foundation from the garden variety connectionist models.  At a high level of abstraction, however, there is enough homology between the two orders to compare their performance on many of the same tasks.  With that in mind, I tried Theme One on a number of examples like the ones suggested by McClelland and Rumelhart.

What follows is a brief discussion of two examples as given in the original User Guide.  Next time I'll fill in more details about the examples and discuss their bearing on the larger issues at hand.

With a query on the name “ken” we obtain the following output, giving all the features associated with Ken.

Theme One Guide • Jets and Sharks • Query 1.png

With a query on the two features “college” and “sharks” we obtain the following outline of all features satisfying those constraints.

Theme One Guide • Jets and Sharks • Query 2.png

From this we discover all college Sharks are 30‑something and married.  Further, we have a complete listing of their names broken down by occupation.

References
  • McClelland, J.L. (2015), Explorations in Parallel Distributed Processing : A Handbook of Models, Programs, and Exercises, 2nd ed. (draft), Stanford Parallel Distributed Processing LabOnline, Section 2.3, Figure 2.1.
  • McClelland, J.L., and Rumelhart, D.E. (1988), Explorations in Parallel Distributed Processing : A Handbook of Models, Programs, and Exercises, MIT Press, Cambridge, MA.  “Figure 1. Characteristics of a number of individuals belonging to two gangs, the Jets and the Sharks”, p. 39, from McClelland (1981).
  • McClelland, J.L. (1981), “Retrieving General and Specific Knowledge From Stored Knowledge of Specifics”, Proceedings of the Third Annual Conference of the Cognitive Science Society, Berkeley, CA.
Resources

Motivation

Motivation 1

The main idea behind the Theme One program is the efficient use of graph-theoretic data structures for the tasks of “learning” and “reasoning”.

I am thinking of learning in the sense of learning about an environment, in essence, gaining information about the nature of an environment and being able to apply the information acquired to a specific purpose.

Under the heading of reasoning I am simply lumping together all the ordinary sorts of practical activities which would probably occur to most people under that name.

There is a natural relation between the tasks.  Learning the character of an environment leads to the recognition of laws which govern the environment and making full use of that recognition requires the ability to reason logically about those laws in abstract terms.

Motivation 2

A side-effect of working on the Theme One program over the course of a decade was the measure of insight it gave me into the reasons why empiricists and rationalists have so much trouble understanding each other, even when those two styles of thinking inhabit the very same soul.

The way it came about was this.  The code from which the program is currently assembled initially came from two distinct programs, ones I developed in alternate years, at first only during the summers.

In the Learner program I sought to implement a Humean empiricist style of learning algorithm for the adaptive uptake of coded sequences of occurrences in the environment, say, as codified in a formal language.  I knew all the theorems from formal language theory telling how limited any such strategy must ultimately be in terms of its generative capacity, but I wanted to explore the boundaries of that capacity in concrete computational terms.

In the Modeler program I aimed to implement a variant of Peirce's graphical syntax for propositional logic, making use of graph-theoretic extensions I had developed over the previous decade.

As I mentioned, work on those two projects proceeded in a parallel series of fits and starts through interwoven summers for a number of years, until one day it dawned on me how the Learner, one of whose aliases was Index, could be put to work helping with sundry substitution tasks the Modeler needed to carry out.

So I began integrating the functions of the Learner and the Modeler, at first still working on the two component modules in an alternating manner, but devoting a portion of effort to amalgamating their principal data structures, bringing them into convergence with each other, and unifying them over a common basis.

Another round of seasons and many changes of mind and programming style, I arrived at a unified graph-theoretic data structure, strung like a wire through the far‑flung pearls of my programmed wit.  But the pearls I polished in alternate years maintained their shine along axes of polarization whose grains remained skew in regard to each other.  To put it more plainly, the strategies I imagined were the smartest tricks to pull from the standpoint of optimizing the program's performance on the Learning task I found the next year were the dumbest moves to pull from the standpoint of its performance on the Reasoning task.  I gradually came to appreciate that trade-off as a discovery.

Motivation 3

Sometime around 1970 John B. Eulenberg came from Stanford to direct Michigan State's Artificial Language Lab, where I would come to spend many interesting hours hanging out all through the 70s and 80s.  Along with its research program the lab did a lot of work on augmentative communication technology for limited mobility users and the observations I made there prompted the first inklings of my Learner program.

Early in that period I visited John's course in mathematical linguistics, which featured Laws of Form among its readings, along with the more standard fare of Wall, Chomsky, Jackendoff, and the Unified Science volume by Charles Morris which credited Peirce with pioneering the pragmatic theory of signs.  I learned about Zipf's Law relating the lengths of codes to their usage frequencies and I named the earliest avatar of my Learner program XyPh, partly after Zipf and playing on the xylem and phloem of its tree data structures.

Motivation 4

From Zipf's Law and the category of “things that vary inversely with frequency” I got my first brush with the idea that keeping track of usage frequencies is part and parcel of building efficient codes.

In its first application the environment the Learner has to learn is the usage behavior of its user, as given by finite sequences of characters from a finite alphabet, which sequences of characters might as well be called “words”, together with finite sequences of those words which might as well be called “phrases” or “sentences”.  In other words, Job One for the Learner is the job of constructing a “user model”.

In that frame of mind we are not seeking anything so grand as a Universal Induction Algorithm but simply looking for any approach to give us a leg up, complexity wise, in Interactive Real Time.

Motivation 5

Since I'm working from decades-old memories of first inklings I thought I might peruse the web for current information about Zipf's Law.  I see there is now something called the Zipf–Mandelbrot (and sometimes –Pareto) Law and that was interesting because my wife Susan Awbrey made use of Mandelbrot's ideas about self-similarity in her dissertation and communicated with him about it.  So there's more to read up on.

Just off-hand, though, I think my Learner is dealing with a different problem.  It has more to do with the savings in effort a learner gets by anticipating future experiences based on its record of past experiences than the savings it gets by minimizing bits of storage as far as mechanically possible.  There is still a type of compression involved but it's more like Korzybski's “time-binding” than space-savings proper.  Speaking of old memories …

The other difference I see is that Zipf's Law applies to an established and preferably large corpus of linguistic material, while my Learner has to start from scratch, accumulating experience over time, making the best of whatever data it has at the outset and every moment thereafter.

Motivation 6

Comments I made in reply to a correspondent's questions about delimiters and tokenizing in the Learner module may be worth sharing here.

In one of the projects I submitted toward a Master's in psychology I used the Theme One program to analyze samples of data from my advisor's funded research study on family dynamics.  In one phase of the study observers viewed video-taped sessions of family members (parent and child) interacting in various modes (“play” or “work”) and coded qualitative features of each moment's activity over a period of time.

The following page describes the application in more detail and reflects on its implications for the conduct of scientific inquiry in general.

In this application a “phrase” or “string” is a fixed-length sequence of qualitative features and a “clause” or “strand” is a sequence of such phrases delimited by what the observer judges to be a significant pause in the action.

In the qualitative research phases of the study one is simply attempting to discern any significant or recurring patterns in the data one possibly can.

In this case the observers are tokenizing the observations according to a codebook that has passed enough intercoder reliability studies to afford them all a measure of confidence it captures meaningful aspects of whatever reality is passing before their eyes and ears.

Discussion

Theme One • A Program Of Inquiry

Applications

Documentation

References

  • Awbrey, S.M., and Awbrey, J.L. (May 1991), “An Architecture for Inquiry • Building Computer Platforms for Discovery”, Proceedings of the Eighth International Conference on Technology and Education, Toronto, Canada, pp. 874–875.  Online.
  • Awbrey, J.L., and Awbrey, S.M. (January 1991), “Exploring Research Data Interactively • Developing a Computer Architecture for Inquiry”, Poster presented at the Annual Sigma Xi Research Forum, University of Texas Medical Branch, Galveston, TX.
  • Awbrey, J.L., and Awbrey, S.M. (August 1990), “Exploring Research Data Interactively • Theme One : A Program of Inquiry”, Proceedings of the Sixth Annual Conference on Applications of Artificial Intelligence and CD-ROM in Education and Training, Society for Applied Learning Technology, Washington, DC, pp. 9–15.  Online.