A263129 - OEIS

A263129

Numbers whose binary representation is the concatenation of all the words in one of its possible Huffman encodings.

1024538, 1024675, 1024789, 1024837, 1024936, 1025347, 1025378, 1025384, 1026593, 10234987, 10236597, 10236758, 10258346, 10259347, 10267845, 10278534, 10283546, 10293486, 10293675, 10294837

(list; graph; refs; listen; history; text; internal format)

OFFSET

1,1

COMMENTS

This sequence is obtained using Huffman coding with decimal numbers as input.

Huffman coding is largely used for lossless data compression and with its algorithm we obtain "Huffman trees" whose branches generate sets of "0-1 words" (codewords), all different, each of them corresponding to a different input symbol.

The Huffman tree is constructed in such a way that the shortest codewords in the output sets match the more frequent symbol in the input (this is the point of Huffman compression).

Moreover, Huffman coding is structured for using codewords that avoid ambiguities (useful for decompression algorithm). E.g., if we have the word "01" in a set, we don't have "010" or "011" (or other longer words that start with "01") in the same set.

In our case, an output codeword set depends especially on the number of different digits and weight of each of them.

So if, for every number n, joining each codeword of one of its output codeword sets in the corresponding decimal digit position, we obtain a "bitstream" that matches to the same n input number binary representation, the number is added to the sequence.

Now, a brief analysis of the first few terms in the sequence.

Huf(Dec[6.0])=16 means "All 6-decimal-digit numbers, 0 repeated, correspond to 16 binary digits of a Huffman coding output". A wordset is, e.g., [00, 01, 100, 101, 110, 111].

But the lowest Dec[6.0] number is 102345, that is, the 17 binary digits 11000111111001001 (i.e., Bin(Dec[6.0])=17).

Also, Huf(Dec[6.(1,2)])=14 (this means "All 6 digits, 1 digit two times") and lowest Bin(Dec[6.(1,2)])=17. The other cases from [6.(2,2)] to [6.(1,6)] are all unbalanced. Thus, the sequence includes no numbers that are 1 to 6 digits in length.

Since with Huf(Dec[7.0])=20, and lowest Bin(Dec[7.0])=20, the first entry of this sequence is 1024538, that is the binary number 11111010001000011010 with the wordset [10, 111, 110, 011, 010, 001, 000].

All the other [7.x] cases are unbalanced, so there are only 9 numbers of 7 decimal digits (different or not) in the sequence.

For Huf(Dec[8.0])=24, we have only the codeword set [111, 110, 101, 011, 001, 010, 100, 000], that generates 297 valid n numbers that are in the sequence (see Program).

E.g., Huf(Dec[9.(1,2)])=27 using a set like [10, 110, 000, 001, 010, 011, 1110, 1111] where the word "10" is used two times in the corresponding decimal digit position, as in 132689520 (see Example).

Is not clear if, with the growth of the length (number of decimal digits) of decimal numbers, and the corresponding growth of the length (number of binary digits) of binary numbers, Huffman coding performs a better compression, and if this sequence is finite or not.

REFERENCES

Francesco Di Matteo, Playful Sequences, Game Edizioni (forthcoming), pages 24-25.

LINKS

Francesco Di Matteo, Table of n, a(n) for n = 1..306

Slawek Ligus, Huffman tree generator

Rosettacode, Huffman coding, a large collection of Huffman coding programming routines.

Wikipedia, Huffman coding

EXAMPLE

Decimal Binary Huffman

1024538 = 11111010001000011010 = 111.110.10.001.000.011.010

1024675 = 11111010001010100011 = 111.110.100.010.101.00.011

1026593 = 11111010101000100001 = 111.110.101.01.000.100.001