Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding
Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data demonstrate how the method might be used in practice. We evaluate the B-Learner using synthetic and semi-synthetic experiments. In semi-synthetic experiments, we find the B-Learner is at least as effective as existing state-of-art models on a previously proposed benchmark. Finally, we illustrate the use of the B-Learner using real data demonstrate how the method might be used in practice. |
| Researcher Affiliation | Academia | 1Cornell University and Cornell Tech 2Princeton University 3Technion, Israel Institute of Technology 4OATML, University of Oxford. |
| Pseudocode | Yes | Our procedure is summarized in Algorithm 1 (see Appendix E for a detailed version). Appendix E provides 'Algorithm 1 The B-Learner: Detailed'. |
| Open Source Code | Yes | We provide replication code at https://github.com/Causal ML/BLearner. |
| Open Datasets | Yes | We replicate the experiment from Jesson et al. (2021) on IHDP Hidden Confounding. The dataset contains synthetic potential outcomes generated according to the response surface B described by Hill (2011). We use the real-world dataset from Chernozhukov & Hansen (2004) that draws on the 1991 Survey of Income and Program Participation. |
| Dataset Splits | Yes | Each realization is split into training (n = 470), validation (n = 202), and test (n = 75) subsets. |
| Hardware Specification | Yes | The results in Section 8 were obtained using an Amazon Web Services instance with 32 v CPUs and 64 Gi B of RAM. |
| Software Dependencies | No | The paper mentions software packages like 'scikit-learn' and 'Py Torch' but does not specify their version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | Table 2 provides 'Hyperparameters for model choices in synthetic data experiments.' listing: Random Forest (scikit-learn) max depth 6 min samples leaf 0.05; RBF (scikit-learn) length scale 0.9 n 1 4+d; Neural Network (Py Torch) hidden units 100 network depth 4 negative slope 0.3 dropout rate 0.2 batch size 50 learning rate 5e-4. For 401(k) data: 'hyperparameters (n estimators = 100, max depth = 7, max features = 3, min samples leaf = 10)'. |