This site is supported by donations to The OEIS Foundation.

T-file proposal

From OeisWiki
Jump to: navigation, search

This is a proposal of a text file format called t-file for listing the terms of higher-dimensional arrays, including triangles, pyramids, and irregular arrays; i. e., the collections of integer numbers indexed by an ordered set of integers rather than single integers. This is an extension of the b-file format.

Motivation

The motivation for this proposed file format is to provide a flexible machine-readable description of two- and higher-dimensional arrays, including irregular ones.

There is plenty of array sequences in the database in various "linearized" forms (normally indicated by keyword tabl or tabf). The Data field, the Offset field and even the b-file cannot provide sufficient descriptions of such sequences in their original, higher-dimensional form, so Example or Comments fields are normally used for that. However, the descriptions given there are not machine-readable, so if one is going to work with the data of such sequence, they should parse the b-file and re-interpret it as an array.

The reinterpretation should be performed in different ways for different sequences and always requires external (with respect to the b-file) information about the offsets of indices, the lengths of rows, and so on. So it would be much more convenient to have a unified way to store information about such sequences in their natural form. t-file could be such a way.

Also, neither the Data nor the b-file may contain gaps.

See below the specific examples of sequences that would benefit from using t-files.

The acceptance of this proposal implies implementation of a special interface for uploading t-files to the OEIS that would check the uploaded t-files for validity and encouraging contributors to use t-files instead of b-files for higher-dimensional sequences. A tool for searching in the t-files will be the next step.

t-file specification

General properties

This specification does not cover the name of a t-file since it is related only to the content of the file.

The recommended encoding of the file is UTF-8 without BOM, and the recommended newline mark is the linefeed character (Unix-style). Both these features may be enforced during the processing of the t-file by the server; i. e., the preservation of non-recommended encoding or newline mark is never guaranteed.

The non-7-bit-ASCII characters must be used only inside the comments (see below) except the minus sign, and it is recommended not to use them at all.

Structure

Leading and trailing whitespace characters are allowed at any line.

Any line may contain a comment starting with "#" (it is recommended that the next symbol should be space since future extensions of the t-file format may make some use of the comments starting with "#" followed by some non-space symbol); all characters from the "#" to the end of the line are ignored.

Apart from the leading and trailing whitespaces and a comment, any line must be either an empty line or a data line containing d+1 integers separated by whitespace characters where d is the dimension of the array; d is the same for the whole sequence. See below for notes about data lines.

The lines may follow in any order. It is recommended that any line does not exceed 1000 characters in length. The trailing linefeed symbol at the end of the final line is recommended and may be enforced during the processing of the t-file by the server.

Data lines

An integer in a data line must be represented by a string of decimal digits, optionally prepended by hyphen-minus "-" U+002D (or minus sign "" U+2212, which is not recommended) interpreted as the unary minus.

The first d numbers of a data line are indices, and the last one is the term. E. g., for a two-dimensional array A(n,k) the order of numbers in the data line must be "n k A(n,k)". There must not be two data lines with the same set of indices. There must be at least one data line.

Recommendations on the order of the data lines

The order of the data lines may be arbitrary. The choice of order of indices depends on the definition of the sequence. However, there are recommendations:

  • if one can fix values of some indices and the other indices become bounded (e. g., for a particular number of row of a triangle the number of term in the row is bounded), then the bounded indices go last, and they together run over all their possible values while the other indices are fixed (e. g., triangles and irregular arrays by layers);
  • for unbounded indices, bound them either
    • by some constants (e. g., for infinite two-dimensional arrays give first N numbers of each row for some N)
    • or using slices with constant sums of indices (e. g., infinite two-dimensional arrays by antidiagonals);
  • inside a given domain of bounded indices, use either
    • the lexicographically ascending order,
    • or the lexicographically descending order,
    • or also using slices with constant sums (see A144625);
  • alternatively, the lines with the same value of the term may be grouped together, or some other principle based on the values of the terms may be used.

Examples of valid t-files

b-files

Any valid b-file or even loose b-file is also a valid t-file. However, for higher-dimensional sequences the interpretation of a b-file as a t-file is semantically wrong.

If one wants to use a t-file instead of a b-file for a one-dimensional sequence, it is recommended to use a single tabulation instead of a space as a separator (that satisfies the loose b-file format), so the file or its part can be easily copied and pasted into a spreadsheet.

A231473

3 0 1
3 1 2
4 0 1
4 1 2
5 0 1
5 1 4
6 0 1
6 1 4
6 2 4
7 0 1
7 0 6
7 0 9
# …

A143796

0 0 1
0 1 2
0 2 3
0 3 4
0 4 5
0 5 6
0 6 7
0 7 8
0 8 9
0 9 10
1 0 2
1 1 3
1 2 4
1 3 5
1 4 6
1 5 7
1 6 8
1 7 9
1 8 10
1 9 11
2 0 3
2 1 5
2 2 7
2 3 9
2 4 11
2 5 13
2 6 15
2 7 17
2 8 19
2 9 21
3 0 5
3 1 13
3 2 29
3 3 61
3 4 125
3 5 253
3 6 509
3 7 1021
3 8 2045
3 9 4093
4 0 13
4 1 65533
5 0 65533

A054760

# according to Brouwer
2 3	3
2 4	4
2 5	5
2 6	6
2 7	7
2 8	8
2 9	9
2 10	10
2 11	11
2 12	12
3 3	4
3 4	6
3 5	10
3 6	14
3 7	24
3 8	30
3 9	58
3 10	70
3 11	112
3 12	126
4 3	5
4 4	8
4 5	19
4 6	26
4 7	67
4 8	80
4 12	728
5 3	6
5 4	10
5 5	30
5 6	42
5 8	170
5 12	2730
6 3	7
6 4	12
6 5	40
6 6	62
6 8	312
6 12	7812
7 3	8
7 4	14
7 5	50
7 6	90

A035513

0 0	1
0 1	2
0 2	3
1 0	4
0 3	5
2 0	6
1 1	7
0 4	8
3 0	9
2 1	10
# …

A046816

   0 0 0 1 # slice 0
# slice 1:
1 0 0	1 
0 1 0	1 
0 0 1	1 
# slice 2 (natural order of lines for this case):
2 0 0	1
0 2 0	1
0 0 2	1
1 1 0	2
1 0 1	2
0 1 1	2
# slice 3 (lexicographically descending order of lines):
3 0 0	1
2 1 0	3
2 0 1	3
1 2 0	3
1 1 1	6
1 0 2	3
0 3 0	1
0 2 1	3
0 1 2	3
0 0 3	1
# …

A113198

0 0 1  1
1 1 0  1
2 2 2  2
3 2 0  2
3 3 2  4
4 2 0  2
4 3 0  2
4 3 2  8
4 4 2  8
5 2 0  1
5 3 0  8
5 3 2  12
5 3 3  4
5 4 0  4
5 4 2  28
5 5 2  16

A189225

# …
4 0 0 0    1
4 1 0 0    4
4 1 1 0    4
4 1 1 1    4
4 2 0 0    6
4 2 1 0    12
4 2 1 1    12
4 2 2 0    6
4 2 2 1    12
4 2 2 2    6
4 3 0 0    4
4 3 1 0    12
4 3 1 1    12
4 3 2 0    12
4 3 2 1    24
4 3 2 2    12
4 3 3 0    4
4 3 3 1    12
4 3 3 2    12
4 3 3 3    4
4 4 0 0    1
4 4 1 0    4
4 4 1 1    4
4 4 2 0    6
4 4 2 1    12
4 4 2 2    6
4 4 3 0    4
4 4 3 1    12
4 4 3 2    12
4 4 3 3    4
4 4 4 0    1
4 4 4 1    4
4 4 4 2    6
4 4 4 3    4
4 4 4 4    1

A191358

# Since sets of indices of variable length are not allowed, the terms in every subsequence P(s, r) are indexed linearly.
# …
5 4 69  4
5 4 70  1
4 5 1   1
4 5 2   5
4 5 3   5
4 5 4   5
4 5 5   10
# …

Example of a parser for t-files

This Python script is supposed to work with any valid t-file. Should work for both Python 2 and 3. Note the simplicity of the parse_unsafe function.

def parse(f):
    '''Parse a t-file into a dict.

Assuming that f is a t-file opened for reading,
parses it into a dict, where keys are tuples of indices.
Returns None if f is not a valid t-file.'''
    d = None
    a = {}
    for line in f:
        if '#' in line:
            line = line[:line.index('#')]
        line = line.strip().replace(u'\u2212', '-')
        if not line:
            continue
        try:
            numbers = tuple(int(x) for x in line.split())
        except ValueError:
            return None
        if d is None:
            d = len(numbers) - 1
            if d < 1:
                return None
        elif len(numbers) - 1 != d:
            return None
        indices, value = numbers[:d], numbers[d]
        if indices in a:
            return None
        a[indices] = value
    return a if a else None

def parse_unsafe(f):
    '''Parse a valid t-file into a dict.

Assuming that f is a valid t-file with hyphen-minuses 
opened for reading, parses it into a dict, where keys are tuples of indices. 
Fails or returns garbage if f is not a valid t-file.'''
    a = {}
    for line in f:
        if '#' in line:
            line = line[:line.index('#')]
        line = line.strip()
        if not line:
            continue
        numbers = tuple(int(x) for x in line.split())
        indices, value = numbers[:-1], numbers[-1]
        a[indices] = value
    return a

def print2Darray(a):
    '''Print a 2D array stored as dict.

Assuming that a is a t-file parsed by parse() 
describing a two-dimensional array, prints it as a table. 
The terms are assumed to be shorter than one tabulation. 
Some elements of the array may be absent.'''
    minrow, maxrow, mincol, maxcol = (
        min(i[0] for i in a),
        max(i[0] for i in a),
        min(i[1] for i in a),
        max(i[1] for i in a)
    )
    for row in range(minrow, maxrow+1):
        line = ''
        for col in range(minrow, maxcol+1):
            i = (row, col)
            if i in a:
                line += str(a[i])
            line += '\t'
        print(line)

def nestedlist(a):
    '''Convert an array stored as dict to a nested list.'''
    for i in a:
        d = len(i)
        break
    minindices = [min(i[dim] for i in a) for dim in range(d)]
    result = []
    for i in a:
        add_element(result, a[i], i, minindices)
    return result

def add_element(a, el, ind, minind):
    '''An auxilary recursive function: add an element to a nested list.'''
    ind, ind2 = ind[0]-minind[0], ind[1:]
    if ind < 0:
        raise ValueError
    if ind2:
        if len(a) <= ind:
            a += [[] for _ in range(ind - len(a) + 1)]
        add_element(a[ind], el, ind2, minind[1:])
    else:
        if len(a) <= ind:
            a += [None] * (ind - len(a) + 1)
        a[ind] = el

with open('t-file.txt') as f:
    a = parse(f)
if a is None:
    print('Error while parsing t-file')
else:
    for i in a:
        d = len(i)
        break
    if d == 2:
        print2Darray(a)

Possible extensions via special comments

There are many extensions that could be thought of. Some of them could even be made mandatory during the initial discussion of the format (this would break the compatibility to the b-files though).

  • A comment of the form #d=<number> may be placed before any data lines to indicate the dimension of the array.
  • A comment of the form #A000001 may indicate the A-number of the sequence.
  • A comment #full may indicate that the t-file contains the complete sequence.
  • It would be good to have a way to distinguish between rows of an array that are infinite (but have finite data available, of course) and rows which are actually finite.

Possible alternatives

Although this t-file proposal have the advantage of compatibility with the b-file format and relative simplicity, perhaps a better option would be to use a restricted version of some existing, widely used format like CSV.