login
A263129
Numbers whose binary representation is the concatenation of all the words in one of its possible Huffman encodings.
1
1024538, 1024675, 1024789, 1024837, 1024936, 1025347, 1025378, 1025384, 1026593, 10234987, 10236597, 10236758, 10258346, 10259347, 10267845, 10278534, 10283546, 10293486, 10293675, 10294837
OFFSET
1,1
COMMENTS
This sequence is obtained using Huffman coding with decimal numbers as input.
Huffman coding is largely used for lossless data compression and with its algorithm we obtain "Huffman trees" whose branches generate sets of "0-1 words" (codewords), all different, each of them corresponding to a different input symbol.
The Huffman tree is constructed in such a way that the shortest codewords in the output sets match the more frequent symbol in the input (this is the point of Huffman compression).
Moreover, Huffman coding is structured for using codewords that avoid ambiguities (useful for decompression algorithm). E.g., if we have the word "01" in a set, we don't have "010" or "011" (or other longer words that start with "01") in the same set.
In our case, an output codeword set depends especially on the number of different digits and weight of each of them.
So if, for every number n, joining each codeword of one of its output codeword sets in the corresponding decimal digit position, we obtain a "bitstream" that matches to the same n input number binary representation, the number is added to the sequence.
Now, a brief analysis of the first few terms in the sequence.
Huf(Dec[6.0])=16 means "All 6-decimal-digit numbers, 0 repeated, correspond to 16 binary digits of a Huffman coding output". A wordset is, e.g., [00, 01, 100, 101, 110, 111].
But the lowest Dec[6.0] number is 102345, that is, the 17 binary digits 11000111111001001 (i.e., Bin(Dec[6.0])=17).
Also, Huf(Dec[6.(1,2)])=14 (this means "All 6 digits, 1 digit two times") and lowest Bin(Dec[6.(1,2)])=17. The other cases from [6.(2,2)] to [6.(1,6)] are all unbalanced. Thus, the sequence includes no numbers that are 1 to 6 digits in length.
Since with Huf(Dec[7.0])=20, and lowest Bin(Dec[7.0])=20, the first entry of this sequence is 1024538, that is the binary number 11111010001000011010 with the wordset [10, 111, 110, 011, 010, 001, 000].
All the other [7.x] cases are unbalanced, so there are only 9 numbers of 7 decimal digits (different or not) in the sequence.
For Huf(Dec[8.0])=24, we have only the codeword set [111, 110, 101, 011, 001, 010, 100, 000], that generates 297 valid n numbers that are in the sequence (see Program).
E.g., Huf(Dec[9.(1,2)])=27 using a set like [10, 110, 000, 001, 010, 011, 1110, 1111] where the word "10" is used two times in the corresponding decimal digit position, as in 132689520 (see Example).
Is not clear if, with the growth of the length (number of decimal digits) of decimal numbers, and the corresponding growth of the length (number of binary digits) of binary numbers, Huffman coding performs a better compression, and if this sequence is finite or not.
REFERENCES
Francesco Di Matteo, Playful Sequences, Game Edizioni (forthcoming), pages 24-25.
LINKS
Francesco Di Matteo, Table of n, a(n) for n = 1..306
Rosettacode, Huffman coding, a large collection of Huffman coding programming routines.
Wikipedia, Huffman coding
EXAMPLE
Decimal Binary Huffman
1024538 = 11111010001000011010 = 111.110.10.001.000.011.010
1024675 = 11111010001010100011 = 111.110.100.010.101.00.011
..
1026593 = 11111010101000100001 = 111.110.101.01.000.100.001
10234987 = 100111000010110001101011 = 100.111.000.010.110.001.101.011
10236597 = 100111000011001010110101 = 100.111.000.011.001.010.110.101
..
132689520 = 111111010001010111001110000 = 1111.110.10.001.010.1110.011.10.000
..
369752480 = 10110000010011111100110100000 = 101.100.0001.001.111.110.011.010.0000
etc.
PROG
(Python)
import itertools
decimal = []
huf = ['000', '001', '010', '011', '100', '101', '110', '111']
# This huf list has been processed by a Huffman coding function.
# In order to find all the eight n different digits numbers, as I say
# in Comments, this is the only Huffman encoding valid wordset, so this
# is the simplest example routine.
ndec = list(map("".join, itertools.permutations('0123456789', len(huf))))
nbin = list(map("".join, itertools.permutations(huf)))
for item in nbin:
decimal.append(str(int(item, 2)))
n = set(decimal).intersection(set(ndec))
print(sorted(n), len(n))
# Francesco Di Matteo, Oct 18 2015
CROSSREFS
Cf. A126014.
Sequence in context: A235159 A235695 A206053 * A241790 A143133 A235106
KEYWORD
nonn,base,fini
AUTHOR
Francesco Di Matteo, Oct 10 2015
STATUS
approved