Automating the Selection of Proxy Variables of Unmeasured Confounders

Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.
Researcher Affiliation Academia 1Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 2School of Computer Science, Guangdong University of Technology, Guangzhou 510006, China 3Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 4Department of Probability and Statistics, Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 Proxy-Rank; Algorithm 2 Proxy-GIN
Open Source Code Yes The source code is in the Supplementary file.
Open Datasets Yes We here consider the following two typical settings: Gaussian case: The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. ... We apply the proposed methods to analyze the causal effects of gene expressions on the body weight of F2 mice using the mouse obesity dataset as described by Wang et al. (2006).
Dataset Splits No The paper mentions data generation and sample sizes but does not specify train/validation/test splits or cross-validation setup for the datasets used in experiments.
Hardware Specification No The paper does not specify any hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper mentions using T. W. Anderson's canonical correlation-based rank test and HSIC-based independence test, but it does not specify any software packages, libraries, or their version numbers used for implementation.
Experiment Setup Yes The data are generated according to the causal graph in Figure 3, with the noise terms being generated from standard normal distributions; Non-Gaussian case: The data are generated according to the graph obtained by removing variable X3 from Figure 3, with the noise terms being generated from standard exponential distributions. In three cases, the connected coefficient βk is sampled from a uniform distribution between [ 1, 1]. ... The sample size is selected from {1, 000(1k), 3, 000(3k), 5, 000(5k)}. Each experiment was repeated 100 times with randomly generated data and the results were averaged.