Automating the Selection of Proxy Variables of Unmeasured Confounders
Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach. |
| Researcher Affiliation | Academia | 1Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China 3Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 4Department of Probability and Statistics, Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Proxy-Rank; Algorithm 2 Proxy-GIN |
| Open Source Code | Yes | The source code is in the Supplementary file. |
| Open Datasets | Yes | We here consider the following two typical settings: Gaussian case: The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. ... We apply the proposed methods to analyze the causal effects of gene expressions on the body weight of F2 mice using the mouse obesity dataset as described by Wang et al. (2006). |
| Dataset Splits | No | The paper mentions data generation and sample sizes but does not specify train/validation/test splits or cross-validation setup for the datasets used in experiments. |
| Hardware Specification | No | The paper does not specify any hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using T. W. Anderson's canonical correlation-based rank test and HSIC-based independence test, but it does not specify any software packages, libraries, or their version numbers used for implementation. |
| Experiment Setup | Yes | The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. In three cases, the connected coefficient βk is sampled from a uniform distribution between [ 1, 1]. ... The sample size is selected from {1, 000(1k), 3, 000(3k), 5, 000(5k)}. Each experiment was repeated 100 times with randomly generated data and the results were averaged. |