Causal Discovery Using Regression-Based Conditional Independence Tests
Authors: Hao Zhang, Shuigeng Zhou, Kun Zhang, Jihong Guan
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed method to both synthetic and real data to evaluate its practical performance and compare it with KCIT, CIP ERM and partial correlation and their applications of PC algorithm. |
| Researcher Affiliation | Academia | Shanghai Key Lab of Intelligent Information Processing, Fudan University, China. Department of Philosophy, Carnegie Mellon University, USA. Department of Computer Science & Technology, Tongji University, China {haoz15, sgzhou}@fudan.edu.cn; kunz1@cmu.edu; jhguan@tongji.edu.cn |
| Pseudocode | Yes | Algorithm 1 PC algorithm based on RCIT (PCRCIT ) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We finally apply RCIT to a real-word dataset used in a previous work (Fukumizu et al. 2007). The data consists of three variables, creatinine clearance (C), digoxin clearance (D), and urine flow (U). These were taken from 35 patients, and analyzed by graphical models in (Edwards 2012). |
| Dataset Splits | No | Here we consider two cases as follows. In Case I, only one variable in Z, denoted by Z1, is effective, i.e., other conditioning variables are independent of X, Y , and Z1. We generate X and Y from Z1 according to the ANM data generating procedure: they are generated as f(g(Z1))+ε where f and g are randomly selected from sin, cos, tanh, square and cubic functions and are different for X and Y , and ε U( 0.2, 0.2). Hence, X Y |Z holds. In our simulations, Zi is i.i.d. uniform U(0, 1). In Case II, all variables in the conditioning set Z are effective in generating X and Y . We first generate the independent variables Zi, then X and Y are generated as i fi(gi(Zi)) + ε where fi and gi are randomly selected from sin, cos, tanh, square and cubic functions. We compare RCIT with KCIT, CIP ERM (with the standard setting of 500 bootstrap samples) and partial correlation test in terms of both types of errors. The significance level is fixed at 0.01. Note that for a good testing method, the probability of Type I error should be as close to the significance level as possible, and the probability of Type II error should be as small as possible. To see how large they are for RCIT, we increase the dimensionality of Z and the sample size n, and repeat the tests 1000 random times. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | In our implementation, we perform the regression using Gaussian Processes (Rasmussen 2006) and the unconditional independence tests of RCIT using KCIT (Zhang et al. 2011). |
| Experiment Setup | Yes | In our implementation, we perform the regression using Gaussian Processes (Rasmussen 2006) and the unconditional independence tests of RCIT using KCIT (Zhang et al. 2011). The significance level is fixed at 0.01. We generate X and Y from Z1 according to the ANM data generating procedure: they are generated as f(g(Z1))+ε where f and g are randomly selected from sin, cos, tanh, square and cubic functions and are different for X and Y , and ε U( 0.2, 0.2). For significance level 0.01 and sample sizes between 25 and 400 we simulate 1000 DAGs, and evaluate the performance of different methods on discovering the causal skeleton and PDAG (including identifiable causal directions). |