This site is supported by donations to The OEIS Foundation.

User:Charles R Greathouse IV/Format

From OeisWiki
Jump to: navigation, search

This page explains various formats relating to the OEIS.

Sequences

See the Style Sheet.

b-files

The following is based, in part, on the Instructions for contributing a b-file to The OEIS and Russ Cox's Jan 20 2011 SeqFan email.

The b-file format is a context-sensitive language, but in practice it is easy to verify (and can be recognized by a one-way 2-head automaton, as defined by Rosenberg). A line may be blank, a comment (starting with #), or of the form number-space-number. A number is one of

  • a digit 1-9, followed by zero or more digits 0-9; or
  • a - sign (Unicode U+002D), followed by the above; or
  • the digit 0

and the space is Unicode U+0020. Lines are terminated by a linefeed (U+000A), including the last line. (If the last character is not a linefeed—that is, if the last line is not blank—the server will not display the correct number of terms!)

Further, the lines must be in order:

  • the first number in each line must be exactly one larger than that of the preceding non-comment, non-blank line (if any), or
  • the first number in each line must be exactly one smaller than that of the preceding non-comment, non-blank line (if any).

The second format is used primarily for constants. (Can it be used legally elsewhere?) Without this restriction the language is regular. In particular the content lines may be matched with the regex

^((?:-?[1-9][0-9]*)|0) ((?:-?[1-9][0-9]*)|0)$

the comments by

^#

and the blanks by

^$

It is recommended that b-files not begin with blank lines and that content lines be contiguous. It is strongly recommended that numbers in content lines not exceed 1000 digits.

The format of files is UTF-8, but note that a Unicode byte-order mark (BOM, EF BB BF) is not allowed. Typical b-files use the ASCII subset of UTF-8: no high bits are set.

Loose b-file format

Some b-files may not match the format above, but can be transformed into it. Their content lines may be recognized as

^\s*((?:[-\x2212]?[1-9][0-9]*)|0)\s+((?:-?[1-9][0-9]*)|0)\s*(#.*)?$

their comment lines as

^\s*#

and their blanks as

^\s*$

Further, lines may be terminated with a carriage return (U+000D) or carriage return-linefeed combination (U+000D U+000A) rather than a linefeed (U+000A). A correct conversion

  • Replaces line terminators with a newline
  • Replaces whitespace in blanks
  • Removes whitespace in comment lines before the # character
  • Moves comments in content lines to either the preceding or following line
  • Removes leading and trailing whitespace in content lines
  • Replaces remaining whitespace in content lines with a single space
  • Replaces minus signs (U+2212) with a hyphen-minus (U+002D) in content lines
  • Replaces -0 with 0 in content lines

Code for working with b-files

This Pari/GP code takes a name (as a string or as a number, e.g. "b000000.txt" or 0) and a vector, plus an optional offset. It outputs a b-file along with boilerplate text for inclusion into the Encyclopedia. The created b-file follows the strict format, including the recommendation that numbers not exceed 1000 digits.

bfile(name, v, offset=1)={
	my(cur=offset-1,Anum);
	if (type(name) == "t_INT",
		name = Vec(Str(name));
		while(#name < 6, name = concat(["0"], name));
		Anum = concat(name);
		name = Str("b"Anum".txt");
	,
		if (type(name) != "t_STR", error("name must be an integer (A-number) or filename (\"b000040.txt\")."));
		Anum = concat(vecextract(Vec(name), 126))
	);
	for(i=1,#v,
		if (#Str(v[i]) > 1000,
			print("Next term has "#Str(v[i])" digits; exiting.");
			break
		);
		write(name, cur++" "v[i]);
	);
	print("%H A"Anum" Author, <a href=\"b"Anum".txt\">Table of n, a(n) for n = "offset".."cur"</a>");
};
addhelp(bfile, "bfile(name, v, offset=1): Creates a b-file with the values of v for A-number name (given as a number or a filename).");

Internal format

Internal format link.png

The following is based, in part, on eishelp1.

The OEIS internal format is a plain-text format often used for computer processing of sequences. It is much easier to parse than the standard HTML response pages, and serves as a rudimentary API for the OEIS. It can also be used to edit sequences, for cases when it is easier than the normal formatted view.

The internal form of a given sequence can be found by clicking the "text" or "internal format" link following the terms of the sequence, or accessed at http://oeis.org/Axxxxxx/internal (served as HTML) or https://oeis.org/search?q=id:Axxxxxx&fmt=text (served as plain text).

General format

There are two kinds of internal format: the 'full' version which includes the sequence number, and the condensed version which does not. The condensed version is described below; the 'full' version (used when multiple sequences could appear) is the same, except that after the type character a space (U+0020) followed by the sequence number is inserted. The 'text' link generates the full format in text, while the 'internal format' link generates the condensed version rendered in HTML.

Each line of the internal format is essentially an independent unit. Lines are encoded in UTF-8 and are terminated with a linefeed (U+000A) character. Each line begins with a percent sign (U+0025) followed by a single (case-sensitive) character which describes the type of the line (described below). The types, if present, must appear in precisely the following order: ISTUVWXNCDHFptoYKOAE. That letter is followed by a single space (U+0020), and any further characters are referred to below as the content of the line.

The order of the lines within each type determines the order in which they appear but is otherwise free. (It is standard OEIS practice to order comments and formulas chronologically but this is not a requirement of the format.) The requirements for each type are given below.

Identification line (I)

This line is required, and is usually blank (no content). If the sequence previously appeared in A Handbook of Integer Sequences or The Encyclopedia of Integer Sequences, then the line consists of one or more space-separated M-/N-numbers. The HIS M-number is given first, then the EIS N-number. If there are multiple M- or N-numbers, they should be given in ascending order within their respective groups. At the time of writing there are only ten sequences with multiple M-numbers or multiple N-numbers: A000586, A000598, A000615, A000616, A001037, A001371, A002189, A002513, A005254, A006809. M- and N-numbers should always have four digits. Examples:

%I M0652 N0241
%I 
%I M0115 N0045 N0285

Unsigned terms (STU)

These lines give the absolute values of the initial terms of the sequence. For historical reasons, these may be split across the %S, %T, and %U lines. The terms should each be either 0 or a digit 1-9 followed by zero or more digits 0-9. If the sequence has keywords recycled, allocated, or allocating then this line should be blank; otherwise there should be at least 1 term. All terms should be separated by a comma (U+002C), not a comma and space or any other combination. It is permissible to put all the terms on the %S line, but the OEIS splits the terms so fewer than 70 characters appear on each of the first two lines. In any case,

  • Lines can be split only just after a comma
  • At least one term must appear on line %S
  • If line %T is included, at least one term must appear on line %T
  • If line %U is included, at least one term must appear on lines %S and %T

Example 1:

%S 1,3,6,8,11,13,16,18,21,23,35,38,43,48,53,58,66,68,71,73,81,86,92,97,
%T 102,107,112,118,120,125,131,133,138,144,146,151,157,159,164,189,199,
%U 203,206,208,219,223,236,242,248,253,258,263,266,269,283,285,288,293

Example 2:

%S 3,14,248,4064,16775168,4294934528,68719345664,1152921504069976064,
%T 1329227995784915872327346307976921088,
%U 95780971304118053647396689042151819065498660774084608,6582018229284824168619876730229361455111736159193471558891864064,7237005577332262213973186563042994240786838745737417944533177174565599576064

Signed terms (VWX)

These lines give the initial terms of the sequence, if the sequence is signed (keyword:sign); otherwise these must be omitted. The content of lines %V, %W, and %X must be identical, respectively, to the contents of the %S, %T, and %U lines, except that terms are prefixed with hyphen-minus (U+002D) when the terms are negative. For signed sequences the %V line is required, the %W line is required if the %T line is present, an the %X line is required if the %U line is present. Each may occur at most once.

Sequence name (N)

This line is required and may only appear once. It gives the name of the sequence.

Comments (C)

These lines are optional; zero or more may appear. They give comments on the sequence.

References (D)

These lines are optional; zero or more may appear. They give references where no link is available.

Links (H)

These lines are optional; zero or more may appear. They give links, either references with links to get the paper or other types of links. This field allows a subset of HTML. At present, it is very restricted: only the a element may appear, and it must have the href attribute and no other attributes. Perhaps in the future other attributes (e.g., rel, hreflang, title, type, translate, dir) will be allowed.

Formulas (F)

These lines are optional; zero or more may appear. They give formulas generating or otherwise relating to the sequence.

Programs (pto)

These lines are optional; zero or more may appear. The %t lines give Mathematica code, the %p lines give Maple code, and the %o lines give code in other languages. The first %o line should start with the name of the language in parentheses; further lines may do the same, in which case they begin new code blocks; otherwise they are assumed to be a continuation of the preceding block.

The program names are somewhat standardized; common languages are (PARI), (MAGMA), (Haskell), (Sage), (Maxima), (Scheme), (Python), and (GAP). See the Programs section of the Style Sheet for a list of languages and their associated preferred format for comments.

Cross-references (Y)

These lines are optional; zero or more may appear. They give cross-references to other sequences. By convention, sequences given without any description are prefaced with "Cf. " (Latin conferre, "compare"), although this is not a requirement of the format. Similarly, tradition suggests separating sequences in lists with a comma (U+002C) and a single space (U+0020).

Once this section contained cross-references with Sequence in context and Adjacent sequences, but these are no longer generated and should be of historical interest only.

Keywords (K)

This line is required and may only appear once. It contains a list of one or more keywords separated by a comma (U+002C), not a comma and space or any other combination. Keywords may appear at most once and subject to certain restrictions; order is not important.

Offset (O)

This line is generally required and may only appear once. Sequences with keywords recycled, allocated, or allocating may not have this line but it is otherwise mandatory. It gives the index of the first term as an integer, positive, zero, or negative; as everywhere in the OEIS, negative terms are prefixed with hyphen-minus (U+002D). If all the terms are in {-1, 0, 1}, or the appearance of the first term outside this set is unknown, the field contains only this number. Otherwise, a comma (U+002C) follows and is followed by the index of this term, except that the first term is counted as 1 rather than its offset. Examples:

%O 1
%O 0,3

The first means that the first term is a(1) and either all terms are -1, 0, or 1, or else the location of the first such term is unknown. The second means that the first term given is a(0) and the third term given, a(2) in this case, is not -1, 0, or 1.

Author (A)

This line is generally required and may only appear once. Sequences with keywords recycled, allocated, or allocating may not have this line, and sequences with keyword dead need not have this line; it is otherwise mandatory. It gives the author(s) or submitter(s) of the sequence, or (exceptionally) other authorities from which the sequence derives. When an author is a registered editor of the OEIS, the author's name should appear as registered and between underscores, allowing it to be shown as a link in appropriate software. Usually the date of submission follows.

Extensions (E)

These lines are optional; zero or more may appear. They credit extensions to the sequence not otherwise signed. See the Extensions section of the Style Sheet for more information on OEIS conventions in this regard.