login
Number of n-dimensional hypotheses allowing for conditional independence.
(Formerly M4725)
6

%I M4725 #68 Jan 15 2024 17:17:31

%S 0,0,1,10,70,431,2534,14820,88267,542912,3475978,23253693,162723444,

%T 1190464900,9092400633,72370378750,599168889634,5150536258735,

%U 45891028609826,423144495659912,4031842435506171,39645279656283820,401806832058661334,4192631368792015237,44992655908959220440

%N Number of n-dimensional hypotheses allowing for conditional independence.

%C From _Petros Hadjicostas_, Oct 10 2019: (Start)

%C Mallows (1979) comments that I. J. Good did not consider all kinds of independence between n random variables in deriving his formula for the e.g.f. of a(n). Unfortunately, the paper by Good (1975), where the proof of the e.g.f. was published, is not easily accessible.

%C We give a sketch of a proof of the formula below (using only the ideas of independence that I. J. Good considered). Given r.v.'s X_1, X_2, ..., X_n, we choose s of them on which to condition (where s = 0, 1, 2, ..., n-2). By "condition", we mean that we condition on every possible s-tuple of values of those chosen variables. This can be done in C(n, s) ways.

%C Note that we can only condition on up to n-2 variables, because we need at least two variables to define any kind of independence: conditional (s >= 1) or unconditional (s = 0). Thus, 2 <= n-s <= n.

%C From the remaining n-s variables, we choose t of them (where 2 <= t <= n-s) on which we will define (or test) independence. [According to Mallows (1979), this is the only kind of independence Good (1975) considers.] There are C(n-s, t) ways to choose the t variables on which to define (or test) independence.

%C Now, there are Bell(t) - 1 = A058692(t) ways to partition the set of t chosen variables into one or more subsets, say {S_1, ..., S_r} (the order of the subsets is not important). Here |S_i| >= 1 and (S_1 union S_2 union ... union S_r) equals the t chosen variables. Thus, there are Bell(t) - 1 ways to factor the joint p.d.f. of the chosen t variables into the product of the joint marginal p.d.f.'s of the variables in S_i (i = 1, ..., r). [By "joint marginal p.d.f." for group S_i we mean the joint p.d.f of all the variables in group S_i.] Each such factorization defines a different kind of independence of the chosen variables (conditional on the original chosen s variables).

%C Thus, in the sense of Good (1975, 1979), there are a(n) = Sum_{s = 0..n-2} Sum_{t = 2..n-s} C(n, s) * C(n-s, t) * (Bell(t) - 1) kinds of independence among the variables.

%C Letting k = n-s, we get a(n) = Sum_{k = 2..n} Sum_{t = 2..k} C(n, n-k) * C(k, t) * (Bell(t) - 1) = Sum_{k = 2..n} Sum_{t = 2..k} C(n, k) * C(k, t) * (Bell(t) - 1).

%C The second formula for a(n) follows from the fact that Sum_{m = 2..k} binomial(k,m) * (Bell(m) - 1) = A058681(k) = Bell(k+1) - 2^k.

%C Counting all kinds of independence, as suggested by Mallows (1979), seems to be a very difficult task. For example, for n = 3, he finds a(3) = 17 kinds of independence rather than 10.

%C (End)

%D N. J. A. Sloane and Simon Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995 (includes this sequence).

%H Vincenzo Librandi, <a href="/A005465/b005465.txt">Table of n, a(n) for n = 0..300</a>

%H I. J. Good, <a href="/A005465/a005465.pdf">The number of hypotheses of independence for a random vector or for a multidimensional contingency table, and the Bell numbers</a>, Iranian J. Science and Technology, 4, (1975), 77-83.

%H I. J. Good, <a href="http://www.jstor.org/stable/2958586">On the application of symmetric Dirichlet distributions and their mixtures to contingency tables</a>, Ann. Statist. 4 (1976), no. 6, 1159-1189.

%H C. L. Mallows, <a href="https://doi.org/10.1080/00949657908810319">How many independence hypotheses are there?</a>, J. Statist. Comput. Simul. 9 (1979), 235-236.

%H Wikipedia, <a href="https://en.wikipedia.org/wiki/I._J._Good">I. J. Good</a>.

%F E.g.f.: exp(exp(x)+2*x-1) - exp(3*x).

%F From _Petros Hadjicostas_, Oct 10 2019: (Start)

%F a(n) = Sum_{k = 2..n} Sum_{m = 2..k} binomial(n,k) * binomial(k,m) * (Bell(m) - 1), where Bell(m) = A000110(m) and Bell(m) - 1 = A058692(m).

%F a(n) = Sum_{k = 2..n} (Bell(k+1) - 2^k) * binomial(n,k) = Sum_{k = 2..n} A058681(k)*binomial(n,k). (End)

%F From _G. C. Greubel_, Feb 23 2022: (Start)

%F a(n) = Sum_{j=0..n} ( binomial(n,j)*2^j*Bell(n-j) ) - 3^n [Good, Iranian J. Sci. Tech., pg. 80, eq (10)].

%F a(n) = Bell(n+2) - Bell(n+1) - 3^n. (End)

%e From _Petros Hadjicostas_, Oct 10 2019: (Start)

%e For two r.v.'s X and Y, there is one kind of independence: f(x,y) = f(x)*f(y) (where f denotes a p.d.f., joint or marginal). Thus, a(2) = 1.

%e For three r.v.'s X, Y, and Z, we have

%e (i) 3 pairwise independence relations (f(x,y) = f(x)*f(y), or f(x,z) = f(x)*f(z), or f(y,z) = f(y)*f(z));

%e (ii) a factorization of the form f(x,y,z) = f(x)*f(y)*f(z);

%e (iii) 3 factorizations of the form f(x,y,z) = f(x,y)*f(z), or f(x,y,z) = f(y,z)*f(x), or f(x,y,z) = f(x,z)*f(y); [e.g. f(x,y,z) = f(x,y)*f(z) means random vector (X,Y) is jointly independent of variable Z]

%e (iv) 3 conditional factorizations f(x,y|z) = f(x|z)*f(y|z), or f(y,z|x) = f(y|x)*f(z|x), or f(x,z|y) = f(x|y)*f(z|y).

%e Hence, a(3) = 3 + 1 + 3 + 3 = 10. (See Mallows (1979) for some kinds of independence not considered by Good (1975, 1976).)

%e For four variables X, Y, Z, W, we have several cases:

%e (i) If we condition on a single variable, then we have 3 + 1 + 3 = 7 conditional independence cases (cases (i), (ii), and (iii) for n = 3) --> a total of C(4,1)*7 = 28.

%e (ii) If we condition on 2 variables, we have 1 kind of independence; i.e., a total of C(4,2)*1 = 6.

%e (iii) If we do not condition on any variables, we may consider:

%e (A) the independence of every two r.v.'s --> C(4,2) = 6 cases.

%e (B) the independence of every three variables: 1 + 3 = 4 factorizations (see the unconditional cases (ii) and (iii) above for the case n=3) --> a total of C(4,3)*4 = 16 factorizations.

%e (C) the factorizations f(x,y,z,w) = f(x)*f(y)*f(z)*f(w), or = f(x,y,z)*f(z), or = f(x,z,w)*f(y), or = f(x,y,w)*f(z), or f(y,z,w)*f(x), or = f(x,y)*f(z,w), or = f(x,z)*f(y,w), or = f(x,w)*f(y,z), or = f(x,w)*f(y)*f(z), etc. --> a total of 15-1 = 14 factorizations.

%e Hence, a(4) = 28 + 6 + 6 + 16 + 14 = 70.

%e (End)

%t With[{nn=30},CoefficientList[Series[Exp[Exp[x]+2x-1]-Exp[3x],{x,0,nn}],x] Range[0,nn]!] (* _Harvey P. Dale_, Nov 04 2015 *)

%o (Magma) [Bell(n+2) -Bell(n+1) -3^n: n in [0..30]]; // _G. C. Greubel_, Feb 23 2022

%o (Sage) [bell_number(n+2) -bell_number(n+1) -3^n for n in (0..30)] # _G. C. Greubel_, Feb 23 2022

%Y Cf. A000110, A058681, A058692.

%K nonn,nice

%O 0,4

%A _N. J. A. Sloane_

%E More terms from _N. J. A. Sloane_, Jun 26 2015