This site is supported by donations to The OEIS Foundation.

Inquiry Driven Systems • Part 3

From OeisWiki
Jump to: navigation, search

Author: Jon Awbrey

Cactus Language • 1

Thus, what looks to us like a sphere of scientific knowledge more accurately should be represented as the inside of a highly irregular and spiky object, like a pincushion or porcupine, with very sharp extensions in certain directions, and virtually no knowledge in immediately adjacent areas.  If our intellectual gaze could shift slightly, it would alter each quill's direction, and suddenly our entire reality would change.

— Herbert J. Bernstein, “Idols of Modern Science”, [HJB, 38]

In this and the four subsections that follow, I describe a calculus for representing propositions as sentences, in other words, as syntactically defined sequences of signs, and for manipulating these sentences chiefly in the light of their semantically defined contents, in other words, with respect to their logical values as propositions. In their computational representation, the expressions of this calculus parse into a class of tree-like data structures called painted cacti. This is a family of graph-theoretic data structures that can be observed to have especially nice properties, turning out to be not only useful from a computational standpoint but also quite interesting from a theoretical point of view. The rest of this subsection serves to motivate the development of this calculus and treats a number of general issues that surround the topic.

In order to facilitate the use of propositions as indicator functions it helps to acquire a flexible notation for referring to propositions in that light, for interpreting sentences in a corresponding role, and for negotiating the requirements of mutual sense between the two domains. If none of the formalisms that are readily available or in common use are able to meet all of the design requirements that come to mind, then it is necessary to contemplate the design of a new language that is especially tailored to the purpose. In the present application, there is a pressing need to devise a general calculus for composing propositions, computing their values on particular arguments, and inverting their indications to arrive at the sets of things in the universe that are indicated by them.

For computational purposes, it is convenient to have a middle ground or an intermediate language for negotiating between the koine of sentences regarded as strings of literal characters and the realm of propositions regarded as objects of logical value, even if this renders it necessary to introduce an artificial medium of exchange between these two domains. If one envisions these computations to be carried out in any organized fashion, and ultimately or partially by means of the familiar sorts of machines, then the strings that express these logical propositions are likely to find themselves parsed into tree-like data structures at some stage of the game. With regard to their abstract structures as graphs, there are several species of graph-theoretic data structures that can be used to accomplish this job in a reasonably effective and efficient way.

Over the course of this project, I plan to use two species of graphs:

  1. Painted And Rooted Cacti (PARCAI).
  2. Painted And Rooted Conifers (PARCOI).

For now, it is enough to discuss the former class of data structures, leaving the consideration of the latter class to a part of the project where their distinctive features are key to developments at that stage. Accordingly, within the context of the current patch of discussion, or until it becomes necessary to attach further notice to the conceivable varieties of parse graphs, the acronym "PARC" is sufficient to indicate the pertinent genus of abstract graphs that are under consideration.

By way of making these tasks feasible to carry out on a regular basis, a prospective language designer is required not only to supply a fluent medium for the expression of propositions, but further to accompany the assertions of their sentences with a canonical mechanism for teasing out the fibers of their indicator functions. Accordingly, with regard to a body of conceivable propositions, one needs to furnish a standard array of techniques for following the threads of their indications from their objective universe to their values for the mind and back again, that is, for tracing the clues that sentences provide from the universe of their objects to the signs of their values, and, in turn, from signs to objects. Ultimately, one seeks to render propositions so functional as indicators of sets and so essential for examining the equality of sets that they can constitute a veritable criterion for the practical conceivability of sets. Tackling this task requires me to introduce a number of new definitions and a collection of additional notational devices, to which I now turn.

Depending on whether a formal language is called by the type of sign that makes it up or whether it is named after the type of object that its signs are intended to denote, one may refer to this cactus language as a sentential calculus or as a propositional calculus, respectively.

When the syntactic definition of the language is well enough understood, then the language can begin to acquire a semantic function. In natural circumstances, the syntax and the semantics are likely to be engaged in a process of co-evolution, whether in ontogeny or in phylogeny, that is, the two developments probably form parallel sides of a single bootstrap. But this is not always the easiest way, at least, at first, to formally comprehend the nature of their action or the power of their interaction.

According to the customary mode of formal reconstruction, the language is first presented in terms of its syntax, in other words, as a formal language of strings called sentences, amounting to a particular subset of the possible strings that can be formed on a finite alphabet of signs. A syntactic definition of the cactus language, one that proceeds along purely formal lines, is carried out in the next Subsection. After that, the development of the language's more concrete aspects can be seen as a matter of defining two functions:

  1. The first is a function that takes each sentence of the language into a computational data structure, to be exact, a tree-like parse graph called a painted cactus.
  2. The second is a function that takes each sentence of the language, or its interpolated parse graph, into a logical proposition, in effect, ending up with an indicator function as the object denoted by the sentence.

The discussion of syntax brings up a number of associated issues that have to be clarified before going on. These are questions of style, that is, the sort of description, grammar, or theory that one finds available or chooses as preferable for a given language. These issues are discussed in the Subsection after next (Subsection

There is an aspect of syntax that is so schematic in its basic character that it can be conveyed by computational data structures, so algorithmic in its uses that it can be automated by routine mechanisms, and so fixed in its nature that its practical exploitation can be served by the usual devices of computation. Because it involves the transformation of signs, it can be recognized as an aspect of semiotics. Since it can be carried out in abstraction from meaning, it is not up to the level of semantics, much less a complete pragmatics, though it does incline to the pragmatic aspects of computation that are auxiliary to and incidental to the human use of language. Therefore, I refer to this aspect of formal language use as the algorithmics or the mechanics of language processing. A mechanical conversion of the cactus language into its associated data structures is discussed in Subsection

In the usual way of proceeding on formal grounds, meaning is added by giving each grammatical sentence, or each syntactically distinguished string, an interpretation as a logically meaningful sentence, in effect, equipping or providing each abstractly well-formed sentence with a logical proposition for it to denote. A semantic interpretation of the cactus language is carried out in Subsection


Picture two different configurations of such an irregular shape, superimposed on each other in space, like a double exposure photograph. Of the two images, the only part which coincides is the body. The two different sets of quills stick out into very different regions of space. The objective reality we see from within the first position, seemingly so full and spherical, actually agrees with the shifted reality only in the body of common knowledge. In every direction in which we look at all deeply, the realm of discovered scientific truth could be quite different. Yet in each of those two different situations, we would have thought the world complete, firmly known, and rather round in its penetration of the space of possible knowledge.

— Herbert J. Bernstein, "Idols of Modern Science", [HJB, 38]

In this Subsection, I describe the syntax of a family of formal languages that I intend to use as a sentential calculus, and thus to interpret for the purpose of reasoning about propositions and their logical relations. In order to carry out the discussion, I need a way of referring to signs as if they were objects like any others, in other words, as the sorts of things that are subject to being named, indicated, described, discussed, and renamed if necessary, that can be placed, arranged, and rearranged within a suitable medium of expression, or else manipulated in the mind, that can be articulated and decomposed into their elementary signs, and that can be strung together in sequences to form complex signs. Signs that have signs as their objects are called higher order signs, and this is a topic that demands an apt formalization, but in due time. The present discussion requires a quicker way to get into this subject, even if it takes informal means that cannot be made absolutely precise.

As a temporary notation, let the relationship between a particular sign and a particular object , namely, the fact that denotes or the fact that is denoted by , be symbolized in one of the following two ways:

Now consider the following paradigm:

When I say that the sign "blank" denotes the sign " ", it means that the string of characters inside the first pair of quotation marks can be used as another name for the string of characters inside the second pair of quotes. In other words, "blank" is a higher order sign whose object is " ", and the string of five characters inside the first pair of quotation marks is a sign at a higher level of signification than the string of one character inside the second pair of quotation marks. This relationship can be abbreviated in either one of the following ways:

Using the raised dot "" as a sign to mark the articulation of a quoted string into a sequence of possibly shorter quoted strings, and thus to mark the concatenation of a sequence of quoted strings into a possibly larger quoted string, one can write:

This usage allows us to refer to the blank as a type of character, and also to refer any blank we choose as a token of this type, referring to either of them in a marked way, but without the use of quotation marks, as I just did. Now, since a blank is just what the name "blank" names, it is possible to represent the denotation of the sign " " by the name "blank" in the form of an identity between the named objects, thus:

With these kinds of identity in mind, it is possible to extend the use of the "" sign to mark the articulation of either named or quoted strings into both named and quoted strings. For example:

A few definitions from formal language theory are required at this point.

An alphabet is a finite set of signs, typically,

A string over an alphabet is a finite sequence of signs from

The length of a string is just its length as a sequence of signs.

The empty string is the unique sequence of length 0. It is sometimes denoted by an empty pair of quotation marks, but more often by the Greek symbols epsilon or lambda.

A sequence of length is typically presented in the concatenated forms:


with for all

Two alternative notations are often useful:

= = the empty string.
= = the language consisting of a single empty string.

The kleene star of alphabet is the set of all strings over In particular, includes among its elements the empty string

The kleene plus of an alphabet is the set of all positive length strings over in other words, everything in but the empty string.

A formal language over an alphabet is a subset of In brief, If is a string over and if is an element of then it is customary to call a sentence of Thus, a formal language is defined by specifying its elements, which amounts to saying what it means to be a sentence of

One last device turns out to be useful in this connection. If is a string that ends with a sign then is the string that results by deleting from the terminal

In this context, I make the following distinction:

  1. To delete an appearance of a sign is to replace it with an appearance of the empty string "".
  2. To erase an appearance of a sign is to replace it with an appearance of the blank symbol " ".

A token is a particular appearance of a sign.

The informal mechanisms that have been illustrated in the immediately preceding discussion are enough to equip the rest of this discussion with a moderately exact description of the so-called cactus language that I intend to use in both my conceptual and my computational representations of the minimal formal logical system that is variously known to sundry communities of interpretation as propositional logic, sentential calculus, or more inclusively, zeroth order logic (ZOL).

The painted cactus language is actually a parameterized family of languages, consisting of one language for each set of paints.

The alphabet is the disjoint union of two sets of symbols:

  1. is the alphabet of measures, the set of punctuation marks, or the collection of syntactic constants that is common to all of the languages This set of signs is given as follows:

  2. is the palette, the alphabet of paints, or the collection of syntactic variables that is peculiar to the language This set of signs is given as follows:

The easiest way to define the language is to indicate the general sorts of operations that suffice to construct the greater share of its sentences from the specified few of its sentences that require a special election. In accord with this manner of proceeding, I introduce a family of operations on strings of that are called syntactic connectives. If the strings on which they operate are exclusively sentences of then these operations are tantamount to sentential connectives, and if the syntactic sentences, considered as abstract strings of meaningless signs, are given a semantics in which they denote propositions, considered as indicator functions over some universe, then these operations amount to propositional connectives.

Rather than presenting the most concise description of these languages right from the beginning, it serves comprehension to develop a picture of their forms in gradual stages, starting from the most natural ways of viewing their elements, if somewhat at a distance, and working through the most easily grasped impressions of their structures, if not always the sharpest acquaintances with their details.

The first step is to define two sets of basic operations on strings of

  1. The concatenation of one string is just the string

    The concatenation of two strings is the string

    The concatenation of the strings is the string of the form

  2. The surcatenation of one string is the string

    The surcatenation of two strings is

    The surcatenation of the strings is the string of the form

These definitions can be made a little more succinct by defining the following sorts of generic operators on strings:

  1. The concatenation of the sequence of strings is defined recursively as follows:
    1. For

  2. The surcatenation of the sequence of strings is defined recursively as follows:
    1. For

The definitions of these syntactic operations can now be organized in a slightly better fashion by making a few additional conventions and auxiliary definitions.

  1. The conception of the -place concatenation operation can be extended to include its natural prequel:

     =  the empty string.

    Next, the construction of the -place concatenation can be broken into stages by means of the following conceptions:

    1. The precatenation of the two strings is the string that is defined as follows:

    2. The concatenation of the sequence of strings can now be defined as an iterated precatenation over the sequence of strings that begins with the string and then continues on through the other strings:

      1. For

  2. The conception of the -place surcatenation operation can be extended to include its natural "prequel":

    Finally, the construction of the -place surcatenation can be broken into stages by means of the following conceptions:

    1. A subclause in is a string that ends with a

    2. The subcatenation of a subclause by a string is the string that is defined as follows:

    3. The surcatenation of the strings can now be defined as an iterated subcatenation over the sequence of strings that starts with the string and then continues on through the other strings:

      1. For

Notice that the expressions and are defined in such a way that the respective operators and simply ignore, in the manner of constants, whatever sequences of strings may be listed as their ostensible arguments.

Having defined the basic operations of concatenation and surcatenation on arbitrary strings, in effect, giving them operational meaning for the all-inclusive language it is time to adjoin the notion of a more discriminating grammaticality, in other words, a more properly restrictive concept of a sentence.

If is an arbitrary formal language over an alphabet of the sort that we are talking about, that is, an alphabet of the form then there are a number of basic structural relations that can be defined on the strings of

1. is the concatenation of and in if and only if
  is a sentence of is a sentence of and
2. is the concatenation of the strings in
  if and only if is a sentence of for all and
3. is the discatenation of by if and only if
  is a sentence of is an element of and
  When this is the case, one more commonly writes:
4. is a subclause of if and only if
  is a sentence of and ends with a
5. is the subcatenation of by if and only if
  is a subclause of is a sentence of and
6. is the surcatenation of the strings in
  if and only if is a sentence of for all and

The converses of these decomposition relations are tantamount to the corresponding forms of composition operations, making it possible for these complementary forms of analysis and synthesis to articulate the structures of strings and sentences in two directions.

The painted cactus language with paints in the set is the formal language that is defined as follows:

PC 1. The blank symbol is a sentence.
PC 2. The paint is a sentence, for each in
PC 3. and are sentences.
PC 4. For each positive integer
  if are sentences,
  then is a sentence,
  and is a sentence.

As usual, saying that is a sentence is just a conventional way of stating that the string belongs to the relevant formal language An individual sentence of for any palette is referred to as a painted and rooted cactus expression (PARCE) on the palette or a cactus expression, for short. Anticipating the forms that the parse graphs of these PARCE's will take, to be described in the next Subsection, the language is also described as the set of PARCE's on the palette more generically, as the PARCE's that constitute the language

A bare PARCE, a bit loosely referred to as a bare cactus expression, is a PARCE on the empty palette A bare PARCE is a sentence in the bare cactus language, This set of strings, regarded as a formal language in its own right, is a sublanguage of every cactus language A bare cactus expression is commonly encountered in practice when one has occasion to start with an arbitrary PARCE and then finds a reason to delete or to erase all of its paints.

Only one thing remains to cast this description of the cactus language into a form that is commonly found acceptable. As presently formulated, the principle PC 4 appears to be attempting to define an infinite number of new concepts all in a single step, at least, it appears to invoke the indefinitely long sequences of operators, and for all As a general rule, one prefers to have an effectively finite description of conceptual objects, and this means restricting the description to a finite number of schematic principles, each of which involves a finite number of schematic effects, that is, a finite number of schemata that explicitly relate conditions to results.

A start in this direction, taking steps toward an effective description of the cactus language, a finitary conception of its membership conditions, and a bounded characterization of a typical sentence in the language, can be made by recasting the present description of these expressions into the pattern of what is called, more or less roughly, a formal grammar.

A notation in the style of is now introduced, to be read among many others in this manifold of ways:

The form is here recruited for polymorphic employment in at least the following types of roles:

  1. To signify that an individually named or quoted string is being typed as a sentence of the language of interest
  2. To express the fact or to make the assertion that each member of a specified set of strings also belongs to the syntactic category the one that qualifies a string as being a sentence in the relevant formal language
  3. To specify the intension or to signify the intention that every string that fits the conditions of the abstract type must also fall under the grammatical heading of a sentence, as indicated by the type all within the target language

In these types of situation the letter that signifies the type of a sentence in the language of interest, is called the initial symbol or the sentence symbol of a candidate formal grammar for the language, while any number of letters like signifying other types of strings that are necessary to a reasonable account or a rational reconstruction of the sentences that belong to the language, are collectively referred to as intermediate symbols.

Combining the singleton set whose sole member is the initial symbol with the set that assembles together all of the intermediate symbols results in the set of non-terminal symbols. Completing the package, the alphabet of the language is also known as the set of terminal symbols. In this discussion, I will adopt the convention that is the set of intermediate symbols, but I will often use as a typical variable that ranges over all of the non-terminal symbols, Finally, it is convenient to refer to all of the symbols in as the augmented alphabet of the prospective grammar for the language, and accordingly to describe the strings in as the augmented strings, in effect, expressing the forms that are superimposed on a language by one of its conceivable grammars. In certain settings it becomes desirable to separate the augmented strings that contain the symbol from all other sorts of augmented strings. In these situations the strings in the disjoint union are known as the sentential forms of the associated grammar.

In forming a grammar for a language statements of the form where and are augmented strings or sentential forms of specified types that depend on the style of the grammar that is being sought, are variously known as characterizations, covering rules, productions, rewrite rules, subsumptions, transformations, or typing rules. These are collected together into a set that serves to complete the definition of the formal grammar in question.

Correlative with the use of this notation, an expression of the form read to say that is covered by can be interpreted to say that is of the type Depending on the context, this can be taken in either one of two ways:

  1. Treating as a string variable, it means that the individual string is typed as
  2. Treating as a type name, it means that any instance of the type also falls under the type

In accordance with these interpretations, an expression of the form can be read in all of the ways that one typically reads an expression of the form

There are several abuses of notation that commonly tolerated in the use of covering relations. The worst offense is that of allowing symbols to stand equivocally either for individual strings or else for their types. There is a measure of consistency to this practice, considering the fact that perfectly individual entities are rarely if ever grasped by means of signs and finite expressions, which entails that every appearance of an apparent token is only a type of more particular tokens, and meaning in the end that there is never any recourse but to the sort of discerning interpretation that can decide just how each sign is intended. In view of all this, I continue to permit expressions like and where any of the symbols can be taken to signify either the tokens or the subtypes of their covering types.

Note. For some time to come in the discussion that follows, although I will continue to focus on the cactus language as my principal object example, my more general purpose will be to develop the subject matter of the formal languages and grammars. I will do this by taking up a particular method of stepwise refinement and using it to extract a rigorous formal grammar for the cactus language, starting with little more than a rough description of the target language and applying a systematic analysis to develop a sequence of increasingly more effective and more exact approximations to the desired grammar.

Employing the notion of a covering relation it becomes possible to redescribe the cactus language in the following ways.

Grammar 1

Grammar 1 is something of a misnomer. It is nowhere near exemplifying any kind of a standard form and it is only intended as a starting point for the initiation of more respectable grammars. Such as it is, it uses the terminal alphabet that comes with the territory of the cactus language it specifies in other words, it employs no intermediate symbols, and it embodies the covering set as listed in the following display.

In this formulation, the last two lines specify that:

  1. The concept of a sentence in covers any concatenation of sentences in in effect, any number of freely chosen sentences that are available to be concatenated one after another.
  2. The concept of a sentence in covers any surcatenation of sentences in in effect, any string that opens with a continues with a sentence, possibly empty, follows with a finite number of phrases of the form and closes with a

This appears to be just about the most concise description of the cactus language that one can imagine, but there are a couple of problems that are commonly felt to afflict this style of presentation and to make it less than completely acceptable. Briefly stated, these problems turn on the following properties of the presentation:

  1. The invocation of the kleene star operation is not reduced to a manifestly finitary form.
  2. The type that indicates a sentence is allowed to cover not only itself but also the empty string.

I will discuss these issues at first in general, and especially in regard to how the two features interact with one another, and then I return to address in further detail the questions that they engender on their individual bases.

In the process of developing a grammar for a language, it is possible to notice a number of organizational, pragmatic, and stylistic questions, whose moment to moment answers appear to decide the ongoing direction of the grammar that develops and the impact of whose considerations work in tandem to determine, or at least to influence, the sort of grammar that turns out. The issues that I can see arising at this point I can give the following prospective names, putting off the discussion of their natures and the treatment of their details to the points in the development of the present example where they evolve their full import.

  1. The degree of intermediate organization in a grammar.
  2. The distinction between empty and significant strings, and thus the distinction between empty and significant types of strings.
  3. The principle of intermediate significance. This is a constraint on the grammar that arises from considering the interaction of the first two issues.

In responding to these issues, it is advisable at first to proceed in a stepwise fashion, all the better to accommodate the chances of pursuing a series of parallel developments in the grammar, to allow for the possibility of reversing many steps in its development, indeed, to take into account the near certain necessity of having to revisit, to revise, and to reverse many decisions about how to proceed toward an optimal description or a satisfactory grammar for the language. Doing all this means exploring the effects of various alterations and innovations as independently from each other as possible.

The degree of intermediate organization in a grammar is measured by how many intermediate symbols it has and by how they interact with each other by means of its productions. With respect to this issue, Grammar 1 has no intermediate symbols at all, and therefore remains at an ostensibly trivial degree of intermediate organization. Some additions to the list of intermediate symbols are practically obligatory in order to arrive at any reasonable grammar at all, other inclusions appear to have a more optional character, though obviously useful from the standpoints of clarity and ease of comprehension.

One of the troubles that is perceived to affect Grammar 1 is that it wastes so much of the available potential for efficient description in recounting over and over again the simple fact that the empty string is present in the language. This arises in part from the statement that which implies that:

There is nothing wrong with the more expansive pan of the covered equation, since it follows straightforwardly from the definition of the kleene star operation, but the covering statement to the effect that is not a very productive piece of information, in the sense of telling very much about the language that falls under the type of a sentence In particular, since it implies that and since for any formal language the empty string is counted over and over in every term of the union, and every non-empty sentence under appears again and again in every term of the union that follows the initial appearance of As a result, this style of characterization has to be classified as true but not very informative. If at all possible, one prefers to partition the language of interest into a disjoint union of subsets, thereby accounting for each sentence under its proper term, and one whose place under the sum serves as a useful parameter of its character or its complexity. In general, this form of description is not always possible to achieve, but it is usually worth the trouble to actualize it whenever it is.

Suppose that one tries to deal with this problem by eliminating each use of the kleene star operation, by reducing it to a purely finitary set of steps, or by finding an alternative way to cover the sublanguage that it is used to generate. This amounts, in effect, to recognizing a type, a complex process that involves the following steps:

  1. Noticing a category of strings that is generated by iteration or recursion.
  2. Acknowledging the fact that it needs to be covered by a non-terminal symbol.
  3. Making a note of it by instituting an explicitly-named grammatical category.

In sum, one introduces a non-terminal symbol for each type of sentence and each part of speech or sentential component that is generated by means of iteration or recursion under the ruling constraints of the grammar. In order to do this one needs to analyze the iteration of each grammatical operation in a way that is analogous to a mathematically inductive definition, but further in a way that is not forced explicitly to recognize a distinct and separate type of expression merely to account for and to recount every increment in the parameter of iteration.

Returning to the case of the cactus language, the process of recognizing an iterative type or a recursive type can be illustrated in the following way. The operative phrases in the simplest sort of recursive definition are its initial part and its generic part. For the cactus language one has the following definitions of concatenation as iterated precatenation and of surcatenation as iterated subcatenation, respectively:

In order to transform these recursive definitions into grammar rules, one introduces a new pair of intermediate symbols, and corresponding to the operations that share the same names but ignoring the inflexions of their individual parameters and Recognizing the type of a sentence by means of the initial symbol and interpreting and as names for the types of strings that are generated by concatenation and by surcatenation, respectively, one arrives at the following transformation of the ruling operator definitions into the form of covering grammar rules:

As given, this particular fragment of the intended grammar contains a couple of features that are desirable to amend.

  1. Given the covering the covering rule says no more than the covering rule Consequently, all of the information contained in these two covering rules is already covered by the statement that
  2. A grammar rule that invokes a notion of decatenation, deletion, erasure, or any other sort of retrograde production, is frequently considered to be lacking in elegance, and a there is a style of critique for grammars that holds it preferable to avoid these types of operations if it is at all possible to do so. Accordingly, contingent on the prescriptions of the informal rule in question, and pursuing the stylistic dictates that are writ in the realm of its aesthetic regime, it becomes necessary for us to backtrack a little bit, to temporarily withdraw the suggestion of employing these elliptical types of operations, but without, of course, eliding the record of doing so.

Grammar 2

One way to analyze the surcatenation of any number of sentences is to introduce an auxiliary type of string, not in general a sentence, but a proper component of any sentence that is formed by surcatenation. Doing this brings one to the following definition:

A tract is a concatenation of a finite sequence of sentences, with a literal comma interpolated between each pair of adjacent sentences. Thus, a typical tract takes the form:

A tract must be distinguished from the abstract sequence of sentences, where the commas that appear to come to mind, as if being called up to separate the successive sentences of the sequence, remain as partially abstract conceptions, or as signs that retain a disengaged status on the borderline between the text and the mind. In effect, the types of commas that appear to follow in the abstract sequence continue to exist as conceptual abstractions and fail to be cognized in a wholly explicit fashion, whether as concrete tokens in the object language, or as marks in the strings of signs that are able to engage one's parsing attention.

Returning to the case of the painted cactus language it is possible to put the currently assembled pieces of a grammar together in the light of the presently adopted canons of style, to arrive a more refined analysis of the fact that the concept of a sentence covers any concatenation of sentences and any surcatenation of sentences, and so to obtain the following form of a grammar:

In this rendition, a string of type is not in general a sentence itself but a proper part of speech, that is, a strictly lesser component of a sentence in any suitable ordering of sentences and their components. In order to see how the grammatical category gets off the ground, that is, to detect its minimal strings and to discover how its ensuing generations get started from these, it is useful to observe that the covering rule means that inherits all of the initial conditions of namely, In accord with these simple beginnings it comes to parse that the rule with the substitutions and on the covered side of the rule, bears the germinal implication that

Grammar 2 achieves a portion of its success through a higher degree of intermediate organization. Roughly speaking, the level of organization can be seen as reflected in the cardinality of the intermediate alphabet but it is clearly not explained by this simple circumstance alone, since it is taken for granted that the intermediate symbols serve a purpose, a purpose that is easily recognizable but that may not be so easy to pin down and to specify exactly. Nevertheless, it is worth the trouble of exploring this aspect of organization and this direction of development a little further.

Grammar 3

Although it is not strictly necessary to do so, it is possible to organize the materials of our developing grammar in a slightly better fashion by recognizing two recurrent types of strings that appear in the typical cactus expression. In doing this, one arrives at the following two definitions:

A rune is a string of blanks and paints concatenated together. Thus, a typical rune is a string over possibly the empty string:

When there is no possibility of confusion, the letter can be used either as a string variable that ranges over the set of runes or else as a type name for the class of runes. The latter reading amounts to the enlistment of a fresh intermediate symbol, as a part of a new grammar for In effect, affords a grammatical recognition for any rune that forms a part of a sentence in In situations where these variant usages are likely to be confused, the types of strings can be indicated by means of expressions like and

A foil is a string of the form where is a tract. Thus, a typical foil has the form:

This is just the surcatenation of the sentences Given the possibility that this sequence of sentences is empty, and thus that the tract is the empty string, the minimum foil is the expression Explicitly marking each foil that is embodied in a cactus expression is tantamount to recognizing another intermediate symbol, further articulating the structures of sentences and expanding the grammar for the language All of the same remarks about the versatile uses of the intermediate symbols, as string variables and as type names, apply again to the letter

In Grammar 3, the first three Rules say that a sentence (a string of type ), is a rune (a string of type ), a foil (a string of type ), or an arbitrary concatenation of strings of these two types. Rules 4 through 7 specify that a rune is an empty string a blank symbol a paint or any concatenation of strings of these three types. Rule 8 characterizes a foil as a string of the form where is a tract. The last two Rules say that a tract is either a sentence or else the concatenation of a tract, a comma, and a sentence, in that order.

At this point in the succession of grammars for the explicit uses of indefinite iterations, like the kleene star operator, are now completely reduced to finite forms of concatenation, but the problems that some styles of analysis have with allowing non-terminal symbols to cover both themselves and the empty string are still present.

Any degree of reflection on this difficulty raises the general question: What is a practical strategy for accounting for the empty string in the organization of any formal language that counts it among its sentences? One answer that presents itself is this: If the empty string belongs to a formal language, it suffices to count it once at the beginning of the formal account that enumerates its sentences and then to move on to more interesting materials.

Returning to the case of the cactus language in other words, the formal language of painted and rooted cactus expressions, it serves the purpose of efficient accounting to partition the language into the following couple of sublanguages:

  1. The emptily painted and rooted cactus expressions make up the language that consists of a single empty string as its only sentence. In short:

  2. The significantly painted and rooted cactus expressions make up the language that consists of everything else, namely, all of the non-empty strings in the language In sum:

As a result of marking the distinction between empty and significant sentences, that is, by categorizing each of these three classes of strings as an entity unto itself and by conceptualizing the whole of its membership as falling under a distinctive symbol, one obtains an equation of sets that connects the three languages being marked:

In sum, one has the disjoint union:

For brevity in the present case, and to serve as a generic device in any similar array of situations, let be the type of an arbitrary sentence, possibly empty, and let be the type of a specifically non-empty sentence. In addition, let be the type of the empty sentence, in effect, the language that contains a single empty string, and let a plus sign signify a disjoint union of types. In the most general type of situation, where the type is permitted to include the empty string, one notes the following relation among types:

With the distinction between empty and significant expressions in mind, I return to the grasp of the cactus language that is afforded by Grammar 2, and, taking that as a point of departure, explore other avenues of possible improvement in the comprehension of these expressions. In order to observe the effects of this alteration as clearly as possible, in isolation from any other potential factors, it is useful to strip away the higher levels intermediate organization that are present in Grammar 3, and start again with a single intermediate symbol, as used in Grammar 2. One way of carrying out this strategy leads on to a grammar of the variety that will be articulated next.

Grammar 4

If one imposes the distinction between empty and significant types on each non-terminal symbol in Grammar 2, then the non-terminal symbols and give rise to the expanded set of non-terminal symbols leaving the last three of these to form the new intermediate alphabet. Grammar 4 has the intermediate alphabet with the set of covering rules as listed in the next display.

In this version of a grammar for the intermediate type is partitioned as thereby parsing the intermediate symbol in parallel fashion with the division of its overlying type as This is an option that I will choose to close off for now, but leave it open to consider at a later point. Thus, it suffices to give a brief discussion of what it involves, in the process of moving on to its chief alternative.

There does not appear to be anything radically wrong with trying this approach to types. It is reasonable and consistent in its underlying principle, and it provides a rational and a homogeneous strategy toward all parts of speech, but it does require an extra amount of conceptual overhead, in that every non-trivial type has to be split into two parts and comprehended in two stages. Consequently, in view of the largely practical difficulties of making the requisite distinctions for every intermediate symbol, it is a common convention, whenever possible, to restrict intermediate types to covering exclusively non-empty strings.

For the sake of future reference, it is convenient to refer to this restriction on intermediate symbols as the intermediate significance constraint. It can be stated in a compact form as a condition on the relations between non-terminal symbols and sentential forms

If this is beginning to sound like a monotone condition, then it is not absurd to sharpen the resemblance and render the likeness more acute. This is done by declaring a couple of ordering relations, denoting them under variant interpretations by the same sign,

  1. The ordering on the set of non-terminal symbols, ordains the initial symbol to be strictly prior to every intermediate symbol. This is tantamount to the axiom that for all
  2. The ordering on the collection of sentential forms, ordains the empty string to be strictly minor to every other sentential form. This is stipulated in the axiom that for every non-empty sentential form

Given these two orderings, the constraint in question on intermediate significance can be stated as follows:

Achieving a grammar that respects this convention typically requires a more detailed account of the initial setting of a type, both with regard to the type of context that incites its appearance and also with respect to the minimal strings that arise under the type in question. In order to find covering productions that satisfy the intermediate significance condition, one must be prepared to consider a wider variety of calling contexts or inciting situations that can be noted to surround each recognized type, and also to enumerate a larger number of the smallest cases that can be observed to fall under each significant type.

Grammar 5

With the foregoing array of considerations in mind, one is gradually led to a grammar for in which all of the covering productions have either one of the following two forms:

A grammar that fits into this mold is called a context-free grammar. The first type of rewrite rule is referred to as a special production, while the second type of rewrite rule is called an ordinary production. An ordinary derivation is one that employs only ordinary productions. In ordinary productions, those that have the form the replacement string is never the empty string, and so the lengths of the augmented strings or the sentential forms that follow one another in an ordinary derivation, on account of using the ordinary types of rewrite rules, never decrease at any stage of the process, up to and including the terminal string that is finally generated by the grammar. This type of feature is known as the non-contracting property of productions, derivations, and grammars. A grammar is said to have the property if all of its covering productions, with the possible exception of are non-contracting. In particular, context-free grammars are special cases of non-contracting grammars. The presence of the non-contracting property within a formal grammar makes the length of the augmented string available as a parameter that can figure into mathematical inductions and motivate recursive proofs, and this handle on the generative process makes it possible to establish the kinds of results about the generated language that are not easy to achieve in more general cases, nor by any other means even in these brands of special cases.

Grammar 5 is a context-free grammar for the painted cactus language that uses with as listed in the next display.

Finally, it is worth trying to bring together the advantages of these diverse styles of grammar, to whatever extent that they are compatible. To do this, a prospective grammar must be capable of maintaining a high level of intermediate organization, like that arrived at in Grammar 2, while respecting the principle of intermediate significance, and thus accumulating all the benefits of the context-free format in Grammar 5. A plausible synthesis of most of these features is given in Grammar 6.

Grammar 6

Grammar 6 has the intermediate alphabet with the production set as listed in the next display.

The preceding development provides a typical example of how an initially effective and conceptually succinct description of a formal language, but one that is terse to the point of allowing its prospective interpreter to waste exorbitant amounts of energy in trying to unravel its implications, can be converted into a form that is more efficient from the operational point of view, even if slightly more ungainly in regard to its elegance.

The basic idea behind all of this machinery remains the same: Besides the select body of formulas that are introduced as boundary conditions, it merely institutes the following general rule:

the strings are sentences,
their concatenation in the form
  is a sentence,
their surcatenation in the form
  is a sentence.

Generalities About Formal Grammars

It is fitting to wrap up the foregoing developments by summarizing the notion of a formal grammar that appeared to evolve in the present case. For the sake of future reference and the chance of a wider application, it is also useful to try to extract the scheme of a formalization that potentially holds for any formal language. The following presentation of the notion of a formal grammar is adapted, with minor modifications, from the treatment in (DDQ, 60–61).

A formal grammar is given by a four-tuple that takes the following form of description:

  1. is the initial, special, start, or sentence symbol. Since the letter serves this function only in a special setting, its employment in this role need not create any confusion with its other typical uses as a string variable or as a sentence variable.
  2. is a finite set of intermediate symbols, all distinct from
  3. is a finite set of terminal symbols, also known as the alphabet of all distinct from and disjoint from Depending on the particular conception of the language that is covered, generated, governed, or ruled by the grammar that is, whether is conceived to be a set of words, sentences, paragraphs, or more extended structures of discourse, it is usual to describe as the alphabet, lexicon, vocabulary, liturgy, or phrase book of both the grammar and the language that it regulates.
  4. is a finite set of characterizations. Depending on how they come into play, these are variously described as covering rules, formations, productions, rewrite rules, subsumptions, transformations, or typing rules.

To describe the elements of it helps to define some additional terms:

  1. The symbols in form the augmented alphabet of
  2. The symbols in are the non-terminal symbols of
  3. The symbols in are the non-initial symbols of
  4. The strings in are the augmented strings for
  5. The strings in are the sentential forms for

Each characterization in is an ordered pair of strings that takes the following form:

In this scheme, and are members of the augmented strings for more precisely, is a non-empty string and a sentential form over while is a possibly empty string and also a sentential form over

Here also, is a non-terminal symbol, that is, while and are possibly empty strings of non-initial symbols, a fact that can be expressed in the form,

In practice, the couplets in are used to derive, to generate, or to produce sentences of the corresponding language The language is then said to be governed, licensed, or regulated by the grammar a circumstance that is expressed in the form In order to facilitate this active employment of the grammar, it is conventional to write the abstract characterization and the specific characterization in the following forms, respectively:

In this usage, the characterization is tantamount to a grammatical license to transform a string of the form into a string of the form in effect, replacing the non-terminal symbol with the non-initial string in any selected, preserved, and closely adjoining context of the form In this application the notation can be read to say that produces or that transforms into

An immediate derivation in is an ordered pair of sentential forms in such that:

As noted above, it is usual to express the condition by writing

The immediate derivation relation is indicated by saying that immediately derives by saying that is immediately derived from in and also by writing:

A derivation in is a finite sequence of sentential forms over such that each adjacent pair of sentential forms in the sequence is an immediate derivation in in other words, such that:

If there exists a derivation in one says that derives in or that is derivable from in and one typically summarizes the derivation by writing:

The language that is generated by the formal grammar is the set of strings over the terminal alphabet that are derivable from the initial symbol by way of the intermediate symbols in according to the characterizations in In sum:

Finally, a string is called a word, a sentence, or so on, of the language generated by if and only if is in