Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Estimating individual treatment effect: generalization bounds and algorithms
Authors: Uri Shalit, Fredrik D. Johansson, David Sontag
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art. |
| Researcher Affiliation | Academia | 1CIMS, New York University, New York, NY 10003 2IMES, MIT, Cambridge, MA 02142 3CSAIL, MIT, Cambridge, MA 02139. Correspondence to: Uri Shalit <EMAIL>, Fredrik D. Johansson <EMAIL>, David Sontag <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 CFR: Counterfactual regression with integral probability metrics |
| Open Source Code | Yes | Both versions are implemented3 as feed-forward neural networks...3https://github.com/clinicalml/cfrnet |
| Open Datasets | Yes | Hill (2011) compiled a dataset for causal effect estimation based on the Infant Health and Development Program (IHDP)... The study by La Londe (1986) is a widely used benchmark in the causal inference community... |
| Dataset Splits | Yes | We average over 1000 realizations of the outcomes with 63/27/10 train/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., CPU, GPU models, or memory). |
| Software Dependencies | No | The paper mentions software like 'Adam (Kingma & Ba, 2015)' and 'NPCI package (Dorie, 2016)' but does not specify version numbers for any libraries or frameworks used in their implementation. |
| Experiment Setup | Yes | Layer sizes were 200 for all layers used for Jobs and 200 and 100 for the representation and hypothesis used for IHDP. The model is trained using Adam (Kingma & Ba, 2015). The hypothesis parameters are regularized with a small ℓ2 weight decay. For continuous data we use mean squared loss and for binary data, we use log-loss. |