This site is supported by donations to The OEIS Foundation.

# Cactus Language • Part 2

Author: Jon Awbrey

## The Cactus Patch (cont.)

It is fitting to wrap up the foregoing developments by summarizing the notion of a formal grammar that appeared to evolve in the present case. For the sake of future reference and the chance of a wider application, it is also useful to try to extract the scheme of a formalization that potentially holds for any formal language. The following presentation of the notion of a formal grammar is adapted, with minor modifications, from the treatment in (DDQ, 60–61).

A formal grammar ${\displaystyle {\mathfrak {G}}}$ is given by a four-tuple ${\displaystyle {\mathfrak {G}}=(\,^{\backprime \backprime }S\,^{\prime \prime },\,{\mathfrak {Q}},\,{\mathfrak {A}},\,{\mathfrak {K}}\,)}$ that takes the following form of description:

1. ${\displaystyle ^{\backprime \backprime }S\,^{\prime \prime }}$ is the initial, special, start, or sentence symbol. Since the letter ${\displaystyle ^{\backprime \backprime }S\,^{\prime \prime }}$ serves this function only in a special setting, its employment in this role need not create any confusion with its other typical uses as a string variable or as a sentence variable.
2. ${\displaystyle {\mathfrak {Q}}=\{q_{1},\ldots ,q_{m}\}}$ is a finite set of intermediate symbols, all distinct from ${\displaystyle ^{\backprime \backprime }S\,^{\prime \prime }.}$
3. ${\displaystyle {\mathfrak {A}}=\{a_{1},\dots ,a_{n}\}}$ is a finite set of terminal symbols, also known as the alphabet of ${\displaystyle {\mathfrak {G}},}$ all distinct from ${\displaystyle ^{\backprime \backprime }S\,^{\prime \prime }}$ and disjoint from ${\displaystyle {\mathfrak {Q}}.}$ Depending on the particular conception of the language ${\displaystyle {\mathfrak {L}}}$ that is covered, generated, governed, or ruled by the grammar ${\displaystyle {\mathfrak {G}},}$ that is, whether ${\displaystyle {\mathfrak {L}}}$ is conceived to be a set of words, sentences, paragraphs, or more extended structures of discourse, it is usual to describe ${\displaystyle {\mathfrak {A}}}$ as the alphabet, lexicon, vocabulary, liturgy, or phrase book of both the grammar ${\displaystyle {\mathfrak {G}}}$ and the language ${\displaystyle {\mathfrak {L}}}$ that it regulates.
4. ${\displaystyle {\mathfrak {K}}}$ is a finite set of characterizations. Depending on how they come into play, these are variously described as covering rules, formations, productions, rewrite rules, subsumptions, transformations, or typing rules.

To describe the elements of ${\displaystyle {\mathfrak {K}}}$ it helps to define some additional terms:

1. The symbols in ${\displaystyle \{\,^{\backprime \backprime }S\,^{\prime \prime }\,\}\cup {\mathfrak {Q}}\cup {\mathfrak {A}}}$ form the augmented alphabet of ${\displaystyle {\mathfrak {G}}.}$
2. The symbols in ${\displaystyle \{\,^{\backprime \backprime }S\,^{\prime \prime }\,\}\cup {\mathfrak {Q}}}$ are the non-terminal symbols of ${\displaystyle {\mathfrak {G}}.}$
3. The symbols in ${\displaystyle {\mathfrak {Q}}\cup {\mathfrak {A}}}$ are the non-initial symbols of ${\displaystyle {\mathfrak {G}}.}$
4. The strings in ${\displaystyle (\{\,^{\backprime \backprime }S\,^{\prime \prime }\,\}\cup {\mathfrak {Q}}\cup {\mathfrak {A}})^{*}}$ are the augmented strings for ${\displaystyle {\mathfrak {G}}.}$
5. The strings in ${\displaystyle \{\,^{\backprime \backprime }S\,^{\prime \prime }\,\}\cup ({\mathfrak {Q}}\cup {\mathfrak {A}})^{*}}$ are the sentential forms for ${\displaystyle {\mathfrak {G}}.}$

Each characterization in ${\displaystyle {\mathfrak {K}}}$ is an ordered pair of strings ${\displaystyle (S_{1},S_{2})}$ that takes the following form:

 ${\displaystyle S_{1}\ =\ Q_{1}\cdot q\cdot Q_{2},}$ ${\displaystyle S_{2}\ =\ Q_{1}\cdot W\cdot Q_{2}.}$

In this scheme, ${\displaystyle S_{1}}$ and ${\displaystyle S_{2}}$ are members of the augmented strings for ${\displaystyle {\mathfrak {G}},}$ more precisely, ${\displaystyle S_{1}}$ is a non-empty string and a sentential form over ${\displaystyle {\mathfrak {G}},}$ while ${\displaystyle S_{2}}$ is a possibly empty string and also a sentential form over ${\displaystyle {\mathfrak {G}}.}$

Here also, ${\displaystyle q}$ is a non-terminal symbol, that is, ${\displaystyle q\in \{\,^{\backprime \backprime }S\,^{\prime \prime }\,\}\cup {\mathfrak {Q}},}$ while ${\displaystyle Q_{1},Q_{2},}$ and ${\displaystyle W}$ are possibly empty strings of non-initial symbols, a fact that can be expressed in the form, ${\displaystyle Q_{1},Q_{2},W\in ({\mathfrak {Q}}\cup {\mathfrak {A}})^{*}.}$

In practice, the couplets in ${\displaystyle {\mathfrak {K}}}$ are used to derive, to generate, or to produce sentences of the corresponding language ${\displaystyle {\mathfrak {L}}={\mathfrak {L}}({\mathfrak {G}}).}$ The language ${\displaystyle {\mathfrak {L}}}$ is then said to be governed, licensed, or regulated by the grammar ${\displaystyle {\mathfrak {G}},}$ a circumstance that is expressed in the form ${\displaystyle {\mathfrak {L}}=\langle {\mathfrak {G}}\rangle .}$ In order to facilitate this active employment of the grammar, it is conventional to write the abstract characterization ${\displaystyle (S_{1},S_{2})}$ and the specific characterization ${\displaystyle (Q_{1}\cdot q\cdot Q_{2},\ Q_{1}\cdot W\cdot Q_{2})}$ in the following forms, respectively:

 ${\displaystyle {\begin{array}{lll}S_{1}&:>&S_{2}\\Q_{1}\cdot q\cdot Q_{2}&:>&Q_{1}\cdot W\cdot Q_{2}\\\end{array}}}$

In this usage, the characterization ${\displaystyle S_{1}:>S_{2}}$ is tantamount to a grammatical license to transform a string of the form ${\displaystyle Q_{1}\cdot q\cdot Q_{2}}$ into a string of the form ${\displaystyle Q1\cdot W\cdot Q2,}$ in effect, replacing the non-terminal symbol ${\displaystyle q}$ with the non-initial string ${\displaystyle W}$ in any selected, preserved, and closely adjoining context of the form ${\displaystyle Q1\cdot {\underline {~~~}}\cdot Q2.}$ In this application the notation ${\displaystyle S_{1}:>S_{2}}$ can be read to say that ${\displaystyle S_{1}}$ produces ${\displaystyle S_{2}}$ or that ${\displaystyle S_{1}}$ transforms into ${\displaystyle S_{2}.}$

An immediate derivation in ${\displaystyle {\mathfrak {G}}}$ is an ordered pair ${\displaystyle (W,W^{\prime })}$ of sentential forms in ${\displaystyle {\mathfrak {G}}}$ such that:

 ${\displaystyle {\begin{array}{llll}W=Q_{1}\cdot X\cdot Q_{2},&W'=Q_{1}\cdot Y\cdot Q_{2},&{\text{and}}&(X,Y)\in {\mathfrak {K}}.\end{array}}}$

As noted above, it is usual to express the condition ${\displaystyle (X,Y)\in {\mathfrak {K}}}$ by writing ${\displaystyle X:>Y\,{\text{in}}\,{\mathfrak {G}}.}$

The immediate derivation relation is indicated by saying that ${\displaystyle W}$ immediately derives ${\displaystyle W',}$ by saying that ${\displaystyle W'}$ is immediately derived from ${\displaystyle W}$ in ${\displaystyle {\mathfrak {G}},}$ and also by writing:

 ${\displaystyle W::>W'.}$

A derivation in ${\displaystyle {\mathfrak {G}}}$ is a finite sequence ${\displaystyle (W_{1},\ldots ,W_{k})}$ of sentential forms over ${\displaystyle {\mathfrak {G}}}$ such that each adjacent pair ${\displaystyle (W_{j},W_{j+1})}$ of sentential forms in the sequence is an immediate derivation in ${\displaystyle {\mathfrak {G}},}$ in other words, such that:

 ${\displaystyle W_{j}::>W_{j+1},\ {\text{for all}}\ j=1\ {\text{to}}\ k-1.}$

If there exists a derivation ${\displaystyle (W_{1},\ldots ,W_{k})}$ in ${\displaystyle {\mathfrak {G}},}$ one says that ${\displaystyle W_{1}}$ derives ${\displaystyle W_{k}}$ in ${\displaystyle {\mathfrak {G}}}$ or that ${\displaystyle W_{k}}$ is derivable from ${\displaystyle W_{1}}$ in ${\displaystyle {\mathfrak {G}},}$ and one typically summarizes the derivation by writing:

 ${\displaystyle W_{1}:\!*\!:>W_{k}.}$

The language ${\displaystyle {\mathfrak {L}}={\mathfrak {L}}({\mathfrak {G}})=\langle {\mathfrak {G}}\rangle }$ that is generated by the formal grammar ${\displaystyle {\mathfrak {G}}=(\,^{\backprime \backprime }S\,^{\prime \prime },\,{\mathfrak {Q}},\,{\mathfrak {A}},\,{\mathfrak {K}}\,)}$ is the set of strings over the terminal alphabet ${\displaystyle {\mathfrak {A}}}$ that are derivable from the initial symbol ${\displaystyle ^{\backprime \backprime }S\,^{\prime \prime }}$ by way of the intermediate symbols in ${\displaystyle {\mathfrak {Q}}}$ according to the characterizations in ${\displaystyle {\mathfrak {K}}.}$ In sum:

 ${\displaystyle {\mathfrak {L}}({\mathfrak {G}})\ =\ \langle {\mathfrak {G}}\rangle \ =\ \{\,W\in {\mathfrak {A}}^{*}\,:\,^{\backprime \backprime }S\,^{\prime \prime }\,:\!*\!:>\,W\,\}.}$

Finally, a string ${\displaystyle W}$ is called a word, a sentence, or so on, of the language generated by ${\displaystyle {\mathfrak {G}}}$ if and only if ${\displaystyle W}$ is in ${\displaystyle {\mathfrak {L}}({\mathfrak {G}}).}$