Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Conformal Meta-learners for Predictive Inference of Individual Treatment Effects
Authors: Ahmed M. Alaa, Zaid Ahmad, Mark van der Laan
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments show that conformal meta-learners provide valid intervals with competitive efficiency while retaining the favorable point estimation properties of CATE meta-learners. We present a number of representative experiments in this Section and defer further results to Appendix C. |
| Researcher Affiliation | Academia | Ahmed M. Alaa UC Berkeley and UCSF EMAIL Zaid Ahmad UC Berkeley EMAIL Mark van der Laan UC Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1: Conformal Meta-learner |
| Open Source Code | Yes | Code: https://github.com/Alaa Lab/conformal-metalearners |
| Open Datasets | Yes | We also consider two well-known semi-synthetic datasets that involve real covariates and simulated outcomes. The first is the National Study of Learning Mindsets (NLSM) [3], and the second is the IHDP benchmark originally developed in [8]. In our experiments, we used the 100 realization of the training and testing data released by [6] in https://www.fredjo.com/files/ihdp_npci_1-100.train.npz and https://www.fredjo.com/files/ihdp_npci_1-100.test.npz. |
| Dataset Splits | Yes | Unless otherwise stated, all experiments followed a 90%/10% train/test split of each dataset, and each training set with further split into a 75%/25% proper training/calibration sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper mentions using a "Gradient Boosting model" and an "R package" with "rpy2 wrappers" for baselines, but does not specify version numbers for these software components or Python. |
| Experiment Setup | Yes | In all experiments, we used a Gradient Boosting model with 100 trees as the base model for nuisance estimation and quantile regression on pseudo-outcomes. The target coverage in all experiments was set 1 α = 0.9. |