Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Double Machine Learning Density Estimation for Local Treatment Effects with Instruments
Authors: Yonghan Jung, Jin Tian, Elias Bareinboim
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The use of the proposed methods is illustrated through both synthetic and a real dataset called 401(k).We illustrate the proposed methods on synthetic and real data. |
| Researcher Affiliation | Academia | Yonghan Jung Purdue University EMAIL, Jin Tian Iowa State University EMAIL, Elias Bareinboim Columbia University EMAIL |
| Pseudocode | No | The paper describes algorithmic steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not provide any explicit statements about making its source code available or links to a code repository. |
| Open Datasets | Yes | In our analysis, we used the dataset introduced by [2] containing 9275 individuals, which has been studied in [2, 17, 5, 47, 58, 64], to cite a few. [2] A. Abadie. Semiparametric instrumental variable estimation of treatment response models. Journal of econometrics, 113(2):231 263, 2003. |
| Dataset Splits | No | The paper mentions 'randomly split halves of the samples' for the DML cross-fitting technique and 'separate validation data or applying cross-validation' for model selection, but it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'XGBoost [11]' for nuisance estimation but does not provide specific version numbers for XGBoost or any other software dependencies. |
| Experiment Setup | Yes | We use the Gaussian kernel. The bandwidth is set to h = 0.5n 1/5. In estimating the density, we choose 200 equi-spaced points {y(i)}200 i=1 in Y and evaluate both estimators at Kh,y(i) for i = 1, , 200. We use KL divergence for Df and the normal distribution for g(y; β). For both approaches, nuisances are estimated through a gradient boosting model XGBoost [11], which is known to be flexible. |