Learning Shadow Variable Representation for Treatment Effect Estimation under Collider Bias

Authors: Baohong Li, Haoxuan Li, Ruoxuan Xiong, Anpeng Wu, Fei Wu, Kun Kuang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed methods outperform existing treatment effect estimation methods under collider bias and prove their potential application value.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2Center for Data Science, Peking University, Beijing, China 3Department of Quantitative Theory & Methods, Emory University, Atlanta, USA.
Pseudocode Yes A. Pseudo-Codes of Shadow Catcher and Shadow Estimator As stated in Section 2, we propose a novel Shadow Catcher that generates representations serving the role of shadow variables and a novel Shadow Estimator that estimates treatment effects under collider bias with the help of the generated representations. The pseudo-codes of Shadow Catcher and Shadow Estimator are detailed in Algorithm 1 and 2, where g denotes the representations generator, hy1 denotes the selected outcome estimator, hy0 denotes the unselected outcome estimator, hr denotes the representations estimator, hz1 and hz0 denote the shadow-variable estimators, eor denotes the odds ratio estimator, hs denotes the sample selection estimator, and q denotes the Q function solver.
Open Source Code Yes The pseudo-codes are in Appendix A, and the source code is available at https://github.com/ ZJUBaohong Li/Shadow Catcher-Shadow Estimator.
Open Datasets Yes The IHDP dataset is from a study evaluating the effect of specialist home visits on the future cognitive test scores of premature infants (Brooksgunn et al., 1992)... the ACIC 2016 dataset (Dorie et al., 2019)... and the Jobs dataset (Shalit et al., 2017)... The Twins data is from a study evaluating the effect of low birth weight on the mortality of infants in their first year of life (Almond et al., 2005).
Dataset Splits Yes We split each dataset into 60/20/20 train/validation/test datasets, independently repeated 20 times, and report the mean and standard deviation (std) of PEHE for all experiments, formed as mean std in the tables.
Hardware Specification Yes The CPU was 13th Gen Intel(R) Core(TM) i7-13700K, and the GPU was NVIDIA Ge Force RTX 3080 with CUDA 12.1.
Software Dependencies No We implemented all the methods in the Py Torch environment with Python 3.9. The CPU was 13th Gen Intel(R) Core(TM) i7-13700K, and the GPU was NVIDIA Ge Force RTX 3080 with CUDA 12.1.
Experiment Setup Yes The hyperparameters of our methods on different datasets are detailed in Table 3.