June  2020, 2(2): 155-172. doi: 10.3934/fods.2020009

A Bayesian nonparametric test for conditional independence

Department of Mathematics, Imperial College London, UK

Published  July 2020

Fund Project: Supported by EPSRC grant EP/R013519/1

This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.

Citation: Onur Teymur, Sarah Filippi. A Bayesian nonparametric test for conditional independence. Foundations of Data Science, 2020, 2 (2) : 155-172. doi: 10.3934/fods.2020009
References:
[1]

J. O. Berger and A. Guglielmi, Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives, J. Amer. Statist. Assoc., 96 (2001), 174-184.  doi: 10.1198/016214501750333045.

[2]

W. Bergsma, Testing conditional independence for continuous random variables, Report Eurandom, 2004.

[3]

T. B. BerrettY. WangR. F. Barber and R. J. Samworth, The conditional permutation test for independence while controlling for confounders, J. R. Stat. Soc. B, 82 (2020), 175-197.  doi: 10.1111/rssb.12340.

[4]

E. CandèsY. FanL. Janson and J. Lv, Panning for gold: Model-X knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 551-577.  doi: 10.1111/rssb.12265.

[5]

G. Doran, K. Muandet, K. Zhang and B. Schölkopf, A permutation-based kernel conditional independence test, Proc. 30th Conf. UAI, 132–141.

[6]

M. Escobar and M. West, Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90 (1995), 577-588.  doi: 10.1080/01621459.1995.10476550.

[7]

S. Filippi and C. Holmes, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Anal., 12 (2017), 919-938.  doi: 10.1214/16-BA1027.

[8]

R. Fisher, The distribution of the partial correlation coefficient, Metron, 3 (1924), 329-332. 

[9]

K. Fukumizu, A. Gretton, X. Sun and B. Schölkopf, Kernel measures of conditional dependence, Adv. Neural Inf. Process. Syst., 20, 489–496.

[10]

S. Ghosal and A. van der Vaart, Fundamentals of Nonparametric Bayesian Inference, Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017. doi: 10.1017/9781139029834.

[11]

J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics, Springer-Verlag, New York, 2003.

[12]

P. Giudici, Bayes factors for zero partial covariances, J. Statist. Plann. Inference, 46 (1995), 161-174.  doi: 10.1016/0378-3758(94)00101-Z.

[13]

T. E. Hanson, Inference for mixtures of finite Pólya tree models, J. Amer. Statist. Assoc., 101 (2006), 1548-1565.  doi: 10.1198/016214506000000384.

[14]

T. Hanson and W. O. Johnson, Modeling regression error with a mixture of Pólya trees, J. Amer. Statist. Assoc., 97 (2002), 1020-1033.  doi: 10.1198/016214502388618843.

[15]

N. Harris and M. Drton, PCalgorithm for nonparanormal graphical models, J. Mach. Learn. Res., 14 (2013), 3365-3383. 

[16]

P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf, Nonlinear causal discovery with additive noise models, Adv. Neural Inf. Process. Syst. 21, 689–696.

[17]

T.-M. Huang, Testing conditional independence using maximal nonlinear conditional correlation, Ann. Statist., 38 (2010), 2047-2091.  doi: 10.1214/09-AOS770.

[18]

R. E. Kass and A. E. Raftery, Bayes factors, J. Amer. Statist. Assoc., 90 (1995), 773-795.  doi: 10.1080/01621459.1995.10476572.

[19]

T. Kunihama and D. B. Dunson, Nonparametric Bayes inference on conditional independence, Biometrika, 103 (2016), 35-47.  doi: 10.1093/biomet/asv060.

[20]

M. Lavine, Some aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 20 (1992), 1222-1235.  doi: 10.1214/aos/1176348767.

[21]

M. Lavine, More aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 22 (1994), 1161-1176.  doi: 10.1214/aos/1176325623.

[22]

L. Ma, Adaptive testing of conditional association through recursive mixture modeling, J. Amer. Statist. Assoc., 108 (2013), 1493-1505.  doi: 10.1080/01621459.2013.838899.

[23]

L. Ma, Recursive partitioning and multi-scale modeling on conditional densities, Electron. J. Stat., 11 (2017), 1297-1325.  doi: 10.1214/17-EJS1254.

[24] D. J. C. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, 2003. 
[25]

D. Margaritis, Distribution-free learning of bayesian network structure in continuous domains, Proc. 20th Nat. Conf. Artificial Intel., (2005), 825–830.

[26]

R. D. MauldinW. D. Sudderth and S. C. Williams, Pólya trees and random distributions, Ann. Statist., 20 (1992), 1203-1221.  doi: 10.1214/aos/1176348766.

[27]

S. M. Paddock, Randomized Pólya Trees: Bayesian Nonparametrics for Multivariate Data Analysis, Thesis (Ph.D.)–Duke University. 1999.

[28] J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, 2009.  doi: 10.1017/CBO9780511803161.
[29] J. PetersD. Janzing and B. Schölkopf, Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press, Cambridge, MA, 2017. 
[30]

J. PetersJ. MooijD. Janzing and B. Schölkopf, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15 (2014), 2009-2053. 

[31]

J. Ramsey, A scalable conditional independence test for nonlinear, non-Gaussian data, arXiv: 1401.5031.

[32]

J. Runge, Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information, arXiv: 1709.01447.

[33]

F. Saad and V. Mansinghka, Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes, Proc. Mach. Learn. Res., 46 (2017), 632-641. 

[34]

R. Shah and J. Peters, The hardness of conditional independence testing and the generalised covariance measure, arXiv: 1804.07203.

[35]

P. Spirtes and C. Glymour, An algorithm for fast recovery of sparse causal graphs, Soc. Sci. Comput. Rev., 9 (1991), 62-72.  doi: 10.1177/089443939100900106.

[36]

E. Strobl, K. Zhang and S. Visweswaran, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, J. Causal Inference, (2019), 20180017. doi: 10.1515/jci-2018-0017.

[37]

L. Su and H. White, A consistent characteristic function-based test for conditional independence, J. Econom., 141 (2007), 807-834.  doi: 10.1016/j.jeconom.2006.11.006.

[38]

L. Su and H. White, A nonparametric Hellinger metric test for conditional independence, Econom. Theory, 24 (2008), 829-864.  doi: 10.1017/S0266466608080341.

[39]

W. H. Wong and L. Ma, Optional Pólya tree and Bayesian inference, Ann. Statist., 38 (2010), 1433-1459.  doi: 10.1214/09-AOS755.

[40]

Q. Zhang, S. Filippi, S. Flaxman and D. Sejdinovic, Feature-to-feature regression for a two-step conditional independence test, Proc. 33rd Conf. UAI, 2017.

[41]

K. Zhang, J. Peters, D. Janzing and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, arXiv: 1202.3775.

[42]

J. ZhangL. Yang and X. Wu, Pólya tree priors and their estimation with multi-group data, Stat. Pap., 60 (2019), 499-525.  doi: 10.1007/s00362-016-0852-x.

show all references

References:
[1]

J. O. Berger and A. Guglielmi, Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives, J. Amer. Statist. Assoc., 96 (2001), 174-184.  doi: 10.1198/016214501750333045.

[2]

W. Bergsma, Testing conditional independence for continuous random variables, Report Eurandom, 2004.

[3]

T. B. BerrettY. WangR. F. Barber and R. J. Samworth, The conditional permutation test for independence while controlling for confounders, J. R. Stat. Soc. B, 82 (2020), 175-197.  doi: 10.1111/rssb.12340.

[4]

E. CandèsY. FanL. Janson and J. Lv, Panning for gold: Model-X knockoffs for high dimensional controlled variable selection, J. R. Stat. Soc. Ser. B. Stat. Methodol., 80 (2018), 551-577.  doi: 10.1111/rssb.12265.

[5]

G. Doran, K. Muandet, K. Zhang and B. Schölkopf, A permutation-based kernel conditional independence test, Proc. 30th Conf. UAI, 132–141.

[6]

M. Escobar and M. West, Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90 (1995), 577-588.  doi: 10.1080/01621459.1995.10476550.

[7]

S. Filippi and C. Holmes, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Anal., 12 (2017), 919-938.  doi: 10.1214/16-BA1027.

[8]

R. Fisher, The distribution of the partial correlation coefficient, Metron, 3 (1924), 329-332. 

[9]

K. Fukumizu, A. Gretton, X. Sun and B. Schölkopf, Kernel measures of conditional dependence, Adv. Neural Inf. Process. Syst., 20, 489–496.

[10]

S. Ghosal and A. van der Vaart, Fundamentals of Nonparametric Bayesian Inference, Cambridge Series in Statistical and Probabilistic Mathematics, 44. Cambridge University Press, Cambridge, 2017. doi: 10.1017/9781139029834.

[11]

J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics, Springer-Verlag, New York, 2003.

[12]

P. Giudici, Bayes factors for zero partial covariances, J. Statist. Plann. Inference, 46 (1995), 161-174.  doi: 10.1016/0378-3758(94)00101-Z.

[13]

T. E. Hanson, Inference for mixtures of finite Pólya tree models, J. Amer. Statist. Assoc., 101 (2006), 1548-1565.  doi: 10.1198/016214506000000384.

[14]

T. Hanson and W. O. Johnson, Modeling regression error with a mixture of Pólya trees, J. Amer. Statist. Assoc., 97 (2002), 1020-1033.  doi: 10.1198/016214502388618843.

[15]

N. Harris and M. Drton, PCalgorithm for nonparanormal graphical models, J. Mach. Learn. Res., 14 (2013), 3365-3383. 

[16]

P. Hoyer, D. Janzing, J. Mooij, J. Peters and B. Schölkopf, Nonlinear causal discovery with additive noise models, Adv. Neural Inf. Process. Syst. 21, 689–696.

[17]

T.-M. Huang, Testing conditional independence using maximal nonlinear conditional correlation, Ann. Statist., 38 (2010), 2047-2091.  doi: 10.1214/09-AOS770.

[18]

R. E. Kass and A. E. Raftery, Bayes factors, J. Amer. Statist. Assoc., 90 (1995), 773-795.  doi: 10.1080/01621459.1995.10476572.

[19]

T. Kunihama and D. B. Dunson, Nonparametric Bayes inference on conditional independence, Biometrika, 103 (2016), 35-47.  doi: 10.1093/biomet/asv060.

[20]

M. Lavine, Some aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 20 (1992), 1222-1235.  doi: 10.1214/aos/1176348767.

[21]

M. Lavine, More aspects of Pólya tree distributions for statistical modelling, Ann. Statist., 22 (1994), 1161-1176.  doi: 10.1214/aos/1176325623.

[22]

L. Ma, Adaptive testing of conditional association through recursive mixture modeling, J. Amer. Statist. Assoc., 108 (2013), 1493-1505.  doi: 10.1080/01621459.2013.838899.

[23]

L. Ma, Recursive partitioning and multi-scale modeling on conditional densities, Electron. J. Stat., 11 (2017), 1297-1325.  doi: 10.1214/17-EJS1254.

[24] D. J. C. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, 2003. 
[25]

D. Margaritis, Distribution-free learning of bayesian network structure in continuous domains, Proc. 20th Nat. Conf. Artificial Intel., (2005), 825–830.

[26]

R. D. MauldinW. D. Sudderth and S. C. Williams, Pólya trees and random distributions, Ann. Statist., 20 (1992), 1203-1221.  doi: 10.1214/aos/1176348766.

[27]

S. M. Paddock, Randomized Pólya Trees: Bayesian Nonparametrics for Multivariate Data Analysis, Thesis (Ph.D.)–Duke University. 1999.

[28] J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, 2009.  doi: 10.1017/CBO9780511803161.
[29] J. PetersD. Janzing and B. Schölkopf, Elements of Causal Inference: Foundations and Learning Algorithms, MIT Press, Cambridge, MA, 2017. 
[30]

J. PetersJ. MooijD. Janzing and B. Schölkopf, Causal discovery with continuous additive noise models, J. Mach. Learn. Res., 15 (2014), 2009-2053. 

[31]

J. Ramsey, A scalable conditional independence test for nonlinear, non-Gaussian data, arXiv: 1401.5031.

[32]

J. Runge, Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information, arXiv: 1709.01447.

[33]

F. Saad and V. Mansinghka, Detecting dependencies in sparse, multivariate databases using probabilistic programming and non-parametric Bayes, Proc. Mach. Learn. Res., 46 (2017), 632-641. 

[34]

R. Shah and J. Peters, The hardness of conditional independence testing and the generalised covariance measure, arXiv: 1804.07203.

[35]

P. Spirtes and C. Glymour, An algorithm for fast recovery of sparse causal graphs, Soc. Sci. Comput. Rev., 9 (1991), 62-72.  doi: 10.1177/089443939100900106.

[36]

E. Strobl, K. Zhang and S. Visweswaran, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, J. Causal Inference, (2019), 20180017. doi: 10.1515/jci-2018-0017.

[37]

L. Su and H. White, A consistent characteristic function-based test for conditional independence, J. Econom., 141 (2007), 807-834.  doi: 10.1016/j.jeconom.2006.11.006.

[38]

L. Su and H. White, A nonparametric Hellinger metric test for conditional independence, Econom. Theory, 24 (2008), 829-864.  doi: 10.1017/S0266466608080341.

[39]

W. H. Wong and L. Ma, Optional Pólya tree and Bayesian inference, Ann. Statist., 38 (2010), 1433-1459.  doi: 10.1214/09-AOS755.

[40]

Q. Zhang, S. Filippi, S. Flaxman and D. Sejdinovic, Feature-to-feature regression for a two-step conditional independence test, Proc. 33rd Conf. UAI, 2017.

[41]

K. Zhang, J. Peters, D. Janzing and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, arXiv: 1202.3775.

[42]

J. ZhangL. Yang and X. Wu, Pólya tree priors and their estimation with multi-group data, Stat. Pap., 60 (2019), 499-525.  doi: 10.1007/s00362-016-0852-x.

Figure 1.  Construction of a Pólya tree distribution on $ \Omega = [0,1] $. From each set $ C_\ast $, a particle of probability mass passes to the left with (random) probability $ \theta_{\ast0} $ and to the right with probability $ \theta_{\ast1} = 1-\theta_{\ast0} $, with all $ \theta_\ast $ being independently Beta-distributed as described in the main text
Figure 2.  Pseudocode for the proposed Bayesian nonparametric test for conditional independence
Figure 3.  Application of the proposed Bayesian testing procedure to four synthetic datasets supported on $ [0,1]^3 $, chosen such that all combinations of unconditional and conditional dependence/independence are represented. The final column gives the ensemble of probabilities of conditional dependence $ p(H_1|W) $ output by the test over 100 repetitions at varying values of data size $ N $, with the blue line representing the median, and the dark and light shaded regions representing the (25, 75)-percentile and (5, 95)-percentile ranges respectively
Figure 4.  Marginal scatter plots from the CalCOFI Bottle dataset showing the pairwise relationships between $\texttt{Salnty}$, $\texttt{Oxy_µmol.Kg}$ and $\texttt{T_degC}$. The nonlinear nature of the dependences is immediately apparent
Figure 5.  Example pairwise dependence graphs output by the Bayesian conditional independence test for five variables from the CalCOFI dataset, conditional on $\texttt{T_degC}$, for four different sizes of subsample drawn from the complete dataset. The numbers associated with each edge are the posterior probabilities of conditional dependence $ p(H_1|W^{(N)}) $ and are given to two decimal places; where no edge is shown, this indicates $ p(H_1|W^{(N)})<0.005 $
Figure 6.  Box-plots giving the output posterior probability of conditional dependence $ p(H_1|W^{(N)}) $ for 100 repetitions of the Bayesian conditional independence test applied to randomly-drawn subsamples of various sizes $ N $ from the CalCOFI dataset. The left-hand plot gives a representative example of a pair of variables conditionally dependent given $\texttt{T_degC}$, while the right-hand plot gives a representative conditionally independent pair
Figure 7.  Top left: Heat map of conditional marginal likelihood values for the three constituent models over $ \Omega_X $, $ \Omega_Y $ and $ \Omega_XY $ for the second and third models of Figure 3. Top right: 'Slices' from this heatmap with $ \rho = 0.5 $. Bottom: Test outputs for 100 repetitions of the second and third models of Figure 3. Red plots fix $ c = 1 $ (output identical to Figure 3), while the blue plots use the optimising values $ \hat{c} $ from the plot above
[1]

Jörg Schmeling. A notion of independence via moving targets. Discrete and Continuous Dynamical Systems, 2006, 15 (1) : 269-280. doi: 10.3934/dcds.2006.15.269

[2]

C. Xiong, J.P. Miller, F. Gao, Y. Yan, J.C. Morris. Testing increasing hazard rate for the progression time of dementia. Discrete and Continuous Dynamical Systems - B, 2004, 4 (3) : 813-821. doi: 10.3934/dcdsb.2004.4.813

[3]

Fryderyk Falniowski, Marcin Kulczycki, Dominik Kwietniak, Jian Li. Two results on entropy, chaos and independence in symbolic dynamics. Discrete and Continuous Dynamical Systems - B, 2015, 20 (10) : 3487-3505. doi: 10.3934/dcdsb.2015.20.3487

[4]

Jean-François Biasse, Michael J. Jacobson, Jr.. Smoothness testing of polynomials over finite fields. Advances in Mathematics of Communications, 2014, 8 (4) : 459-477. doi: 10.3934/amc.2014.8.459

[5]

Antoni Buades, Bartomeu Coll, Jose-Luis Lisani, Catalina Sbert. Conditional image diffusion. Inverse Problems and Imaging, 2007, 1 (4) : 593-608. doi: 10.3934/ipi.2007.1.593

[6]

Tomáš Smejkal, Jiří Mikyška, Jaromír Kukal. Comparison of modern heuristics on solving the phase stability testing problem. Discrete and Continuous Dynamical Systems - S, 2021, 14 (3) : 1161-1180. doi: 10.3934/dcdss.2020227

[7]

Philippe Destuynder, Caroline Fabre. Few remarks on the use of Love waves in non destructive testing. Discrete and Continuous Dynamical Systems - S, 2016, 9 (2) : 427-444. doi: 10.3934/dcdss.2016005

[8]

Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics and Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117

[9]

Christopher Oballe, Alan Cherne, Dave Boothe, Scott Kerick, Piotr J. Franaszczuk, Vasileios Maroulas. Bayesian topological signal processing. Discrete and Continuous Dynamical Systems - S, 2022, 15 (4) : 797-817. doi: 10.3934/dcdss.2021084

[10]

David Simmons. Conditional measures and conditional expectation; Rohlin's Disintegration Theorem. Discrete and Continuous Dynamical Systems, 2012, 32 (7) : 2565-2582. doi: 10.3934/dcds.2012.32.2565

[11]

Xiaomin Zhou. A formula of conditional entropy and some applications. Discrete and Continuous Dynamical Systems, 2016, 36 (7) : 4063-4075. doi: 10.3934/dcds.2016.36.4063

[12]

Deng Lu, Maria De Iorio, Ajay Jasra, Gary L. Rosner. Bayesian inference for latent chain graphs. Foundations of Data Science, 2020, 2 (1) : 35-54. doi: 10.3934/fods.2020003

[13]

Sahani Pathiraja, Sebastian Reich. Discrete gradients for computational Bayesian inference. Journal of Computational Dynamics, 2019, 6 (2) : 385-400. doi: 10.3934/jcd.2019019

[14]

Masoumeh Dashti, Stephen Harris, Andrew Stuart. Besov priors for Bayesian inverse problems. Inverse Problems and Imaging, 2012, 6 (2) : 183-200. doi: 10.3934/ipi.2012.6.183

[15]

Mila Nikolova. Model distortions in Bayesian MAP reconstruction. Inverse Problems and Imaging, 2007, 1 (2) : 399-422. doi: 10.3934/ipi.2007.1.399

[16]

Matthew M. Dunlop, Andrew M. Stuart. The Bayesian formulation of EIT: Analysis and algorithms. Inverse Problems and Imaging, 2016, 10 (4) : 1007-1036. doi: 10.3934/ipi.2016030

[17]

Monica Pragliola, Daniela Calvetti, Erkki Somersalo. Overcomplete representation in a hierarchical Bayesian framework. Inverse Problems and Imaging, 2022, 16 (1) : 19-38. doi: 10.3934/ipi.2021039

[18]

Huijie Qiao, Jiang-Lun Wu. On the path-independence of the Girsanov transformation for stochastic evolution equations with jumps in Hilbert spaces. Discrete and Continuous Dynamical Systems - B, 2019, 24 (4) : 1449-1467. doi: 10.3934/dcdsb.2018215

[19]

Felipe Cucker, Jiu-Gang Dong. A conditional, collision-avoiding, model for swarming. Discrete and Continuous Dynamical Systems, 2014, 34 (3) : 1009-1020. doi: 10.3934/dcds.2014.34.1009

[20]

Ping Huang, Ercai Chen, Chenwei Wang. Entropy formulae of conditional entropy in mean metrics. Discrete and Continuous Dynamical Systems, 2018, 38 (10) : 5129-5144. doi: 10.3934/dcds.2018226

 Impact Factor: 

Metrics

  • PDF downloads (271)
  • HTML views (523)
  • Cited by (0)

Other articles
by authors

[Back to Top]