B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding

Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data demonstrate how the method might be used in practice. We evaluate the B-Learner using synthetic and semi-synthetic experiments. In semi-synthetic experiments, we find the B-Learner is at least as effective as existing state-of-art models on a previously proposed benchmark. Finally, we illustrate the use of the B-Learner using real data demonstrate how the method might be used in practice.
Researcher Affiliation Academia 1Cornell University and Cornell Tech 2Princeton University 3Technion, Israel Institute of Technology 4OATML, University of Oxford.
Pseudocode Yes Our procedure is summarized in Algorithm 1 (see Appendix E for a detailed version). Appendix E provides 'Algorithm 1 The B-Learner: Detailed'.
Open Source Code Yes We provide replication code at https://github.com/Causal ML/BLearner.
Open Datasets Yes We replicate the experiment from Jesson et al. (2021) on IHDP Hidden Confounding. The dataset contains synthetic potential outcomes generated according to the response surface B described by Hill (2011). We use the real-world dataset from Chernozhukov & Hansen (2004) that draws on the 1991 Survey of Income and Program Participation.
Dataset Splits Yes Each realization is split into training (n = 470), validation (n = 202), and test (n = 75) subsets.
Hardware Specification Yes The results in Section 8 were obtained using an Amazon Web Services instance with 32 v CPUs and 64 Gi B of RAM.
Software Dependencies No The paper mentions software packages like 'scikit-learn' and 'Py Torch' but does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes Table 2 provides 'Hyperparameters for model choices in synthetic data experiments.' listing: Random Forest (scikit-learn) max depth 6 min samples leaf 0.05; RBF (scikit-learn) length scale 0.9 n 1 4+d; Neural Network (Py Torch) hidden units 100 network depth 4 negative slope 0.3 dropout rate 0.2 batch size 50 learning rate 5e-4. For 401(k) data: 'hyperparameters (n estimators = 100, max depth = 7, max features = 3, min samples leaf = 10)'.