reproducibilityindex.ai

Automating the Selection of Proxy Variables of Unmeasured Confounders

Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.
Researcher Affiliation	Academia	1Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China 3Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 4Department of Probability and Statistics, Peking University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Proxy-Rank; Algorithm 2 Proxy-GIN
Open Source Code	Yes	The source code is in the Supplementary file.
Open Datasets	Yes	We here consider the following two typical settings: Gaussian case: The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. ... We apply the proposed methods to analyze the causal effects of gene expressions on the body weight of F2 mice using the mouse obesity dataset as described by Wang et al. (2006).
Dataset Splits	No	The paper mentions data generation and sample sizes but does not specify train/validation/test splits or cross-validation setup for the datasets used in experiments.
Hardware Specification	No	The paper does not specify any hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper mentions using T. W. Anderson's canonical correlation-based rank test and HSIC-based independence test, but it does not specify any software packages, libraries, or their version numbers used for implementation.
Experiment Setup	Yes	The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. In three cases, the connected coefficient βk is sampled from a uniform distribution between [ 1, 1]. ... The sample size is selected from {1, 000(1k), 3, 000(3k), 5, 000(5k)}. Each experiment was repeated 100 times with randomly generated data and the results were averaged.