D-Variate Independence Testing

Here, we consider joint independence testing of \(d\) random variables. This is a more difficult task than pairwise independence testing, but this can be very useful when we are asking the question of whether three or more groups are affecting one another. Joint independence can be tested by combining pairwise independence tests, but using a \(d\)-variate independence test is generally faster.

The \(d\)-variate independence test can be found in hyppo.d_variate, and will be explained in detail below. Like all the other tests within hyppo, each method has a statistic and test method. The test method is the one that returns the test statistic and p-values, among other outputs, and is the one that is used most often in the examples, tutorials, etc. The p-value returned is calculated using a permutation test using hyppo.tools.multi_perm_test.

Specifics about how the statistic is calculated in hyppo.d_variate can be found in the docstring of the test. Here, we give an overview of the \(d\)-variate independence test we offer in hyppo and some of its properties compared to those in hyppo.independence.

D-variable Hilbert Schmidt Independence Criterion (dHsic)

dHsic is an extension of hyppo.independence.Hsic, and it uses the reproducing kernel Hilbert space to test for the joint independence of \(d\) random variables. More details can be found in hyppo.d_variate.dHsic. Note that unlike hyppo.independence.Hsic, there is no fast version of the test. It always uses the permutation method to compute its p-value.

Note

Pros
  • Highly accurate independence test for d random variables

  • Much faster than constructing a joint independence test from multiple pairwise independence tests

Cons
  • Is not always more powerful than pairwise Hsic, depends on simulation

and the dependence structure of the variables

dHsic is often computationally less expensive than using pairwise Hsic, and if dimension \(d\) is too large, a pairwise Hsic approach may fail to reject the null hypothesis.

The following is a general use case of dHsic using data points that simulate a 1D linear relationship between random variables \(X\), \(Y\), \(U\), and \(V\). Note that here we use the default gaussian kernel with a gamma value of 0.5. For a full list of parameters, see hyppo.d_variate.dHsic.

from hyppo.d_variate import dHsic
from hyppo.tools import linear

x, y = linear(100, 1)
u, v = linear(100, 1)
stat, pvalue = dHsic(gamma=0.5).test(x, y, u, v)
print(stat, pvalue)
Traceback (most recent call last):
  File "/opt/build/repo/tutorials/dvariate.py", line 54, in <module>
    from hyppo.tools import linear
  File "/opt/build/repo/hyppo/__init__.py", line 1, in <module>
    import hyppo.discrim
  File "/opt/build/repo/hyppo/discrim/__init__.py", line 1, in <module>
    from .discrim_one_samp import DiscrimOneSample
  File "/opt/build/repo/hyppo/discrim/discrim_one_samp.py", line 5, in <module>
    from ._utils import _CheckInputs
  File "/opt/build/repo/hyppo/discrim/_utils.py", line 4, in <module>
    from ..tools import check_ndarray_xy, check_reps, contains_nan, convert_xy_float64
  File "/opt/build/repo/hyppo/tools/__init__.py", line 4, in <module>
    from .power import *
  File "/opt/build/repo/hyppo/tools/power.py", line 30, in <module>
    "indep": _indep_perm_stat,
NameError: name '_indep_perm_stat' is not defined

Total running time of the script: ( 0 minutes 0.002 seconds)

Gallery generated by Sphinx-Gallery