reproducibilityindex.ai

Causal Discovery Using Regression-Based Conditional Independence Tests

Authors: Hao Zhang, Shuigeng Zhou, Kun Zhang, Jihong Guan

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply the proposed method to both synthetic and real data to evaluate its practical performance and compare it with KCIT, CIP ERM and partial correlation and their applications of PC algorithm.
Researcher Affiliation	Academia	Shanghai Key Lab of Intelligent Information Processing, Fudan University, China. Department of Philosophy, Carnegie Mellon University, USA. Department of Computer Science & Technology, Tongji University, China {haoz15, sgzhou}@fudan.edu.cn; kunz1@cmu.edu; jhguan@tongji.edu.cn
Pseudocode	Yes	Algorithm 1 PC algorithm based on RCIT (PCRCIT )
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We ﬁnally apply RCIT to a real-word dataset used in a previous work (Fukumizu et al. 2007). The data consists of three variables, creatinine clearance (C), digoxin clearance (D), and urine ﬂow (U). These were taken from 35 patients, and analyzed by graphical models in (Edwards 2012).
Dataset Splits	No	Here we consider two cases as follows. In Case I, only one variable in Z, denoted by Z1, is effective, i.e., other conditioning variables are independent of X, Y , and Z1. We generate X and Y from Z1 according to the ANM data generating procedure: they are generated as f(g(Z1))+ε where f and g are randomly selected from sin, cos, tanh, square and cubic functions and are different for X and Y , and ε U( 0.2, 0.2). Hence, X Y \|Z holds. In our simulations, Zi is i.i.d. uniform U(0, 1). In Case II, all variables in the conditioning set Z are effective in generating X and Y . We ﬁrst generate the independent variables Zi, then X and Y are generated as i fi(gi(Zi)) + ε where fi and gi are randomly selected from sin, cos, tanh, square and cubic functions. We compare RCIT with KCIT, CIP ERM (with the standard setting of 500 bootstrap samples) and partial correlation test in terms of both types of errors. The signiﬁcance level is ﬁxed at 0.01. Note that for a good testing method, the probability of Type I error should be as close to the signiﬁcance level as possible, and the probability of Type II error should be as small as possible. To see how large they are for RCIT, we increase the dimensionality of Z and the sample size n, and repeat the tests 1000 random times.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	In our implementation, we perform the regression using Gaussian Processes (Rasmussen 2006) and the unconditional independence tests of RCIT using KCIT (Zhang et al. 2011).
Experiment Setup	Yes	In our implementation, we perform the regression using Gaussian Processes (Rasmussen 2006) and the unconditional independence tests of RCIT using KCIT (Zhang et al. 2011). The signiﬁcance level is ﬁxed at 0.01. We generate X and Y from Z1 according to the ANM data generating procedure: they are generated as f(g(Z1))+ε where f and g are randomly selected from sin, cos, tanh, square and cubic functions and are different for X and Y , and ε U( 0.2, 0.2). For signiﬁcance level 0.01 and sample sizes between 25 and 400 we simulate 1000 DAGs, and evaluate the performance of different methods on discovering the causal skeleton and PDAG (including identiﬁable causal directions).