Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conformal Meta-learners for Predictive Inference of Individual Treatment Effects

Authors: Ahmed M. Alaa, Zaid Ahmad, Mark van der Laan

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments show that conformal meta-learners provide valid intervals with competitive efficiency while retaining the favorable point estimation properties of CATE meta-learners. We present a number of representative experiments in this Section and defer further results to Appendix C.
Researcher Affiliation Academia Ahmed M. Alaa UC Berkeley and UCSF EMAIL Zaid Ahmad UC Berkeley EMAIL Mark van der Laan UC Berkeley EMAIL
Pseudocode Yes Algorithm 1: Conformal Meta-learner
Open Source Code Yes Code: https://github.com/Alaa Lab/conformal-metalearners
Open Datasets Yes We also consider two well-known semi-synthetic datasets that involve real covariates and simulated outcomes. The first is the National Study of Learning Mindsets (NLSM) [3], and the second is the IHDP benchmark originally developed in [8]. In our experiments, we used the 100 realization of the training and testing data released by [6] in https://www.fredjo.com/files/ihdp_npci_1-100.train.npz and https://www.fredjo.com/files/ihdp_npci_1-100.test.npz.
Dataset Splits Yes Unless otherwise stated, all experiments followed a 90%/10% train/test split of each dataset, and each training set with further split into a 75%/25% proper training/calibration sets.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper mentions using a "Gradient Boosting model" and an "R package" with "rpy2 wrappers" for baselines, but does not specify version numbers for these software components or Python.
Experiment Setup Yes In all experiments, we used a Gradient Boosting model with 100 trees as the base model for nuisance estimation and quantile regression on pseudo-outcomes. The target coverage in all experiments was set 1 α = 0.9.