reproducibilityindex.ai

Double Machine Learning Density Estimation for Local Treatment Effects with Instruments

Authors: Yonghan Jung, Jin Tian, Elias Bareinboim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The use of the proposed methods is illustrated through both synthetic and a real dataset called 401(k).We illustrate the proposed methods on synthetic and real data.
Researcher Affiliation	Academia	Yonghan Jung Purdue University jung222@purdue.edu, Jin Tian Iowa State University jtian@iastate.edu, Elias Bareinboim Columbia University eb@cs.columbia.edu
Pseudocode	No	The paper describes algorithmic steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not provide any explicit statements about making its source code available or links to a code repository.
Open Datasets	Yes	In our analysis, we used the dataset introduced by [2] containing 9275 individuals, which has been studied in [2, 17, 5, 47, 58, 64], to cite a few. [2] A. Abadie. Semiparametric instrumental variable estimation of treatment response models. Journal of econometrics, 113(2):231 263, 2003.
Dataset Splits	No	The paper mentions 'randomly split halves of the samples' for the DML cross-fitting technique and 'separate validation data or applying cross-validation' for model selection, but it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper mentions using 'XGBoost [11]' for nuisance estimation but does not provide specific version numbers for XGBoost or any other software dependencies.
Experiment Setup	Yes	We use the Gaussian kernel. The bandwidth is set to h = 0.5n 1/5. In estimating the density, we choose 200 equi-spaced points {y(i)}200 i=1 in Y and evaluate both estimators at Kh,y(i) for i = 1, , 200. We use KL divergence for Df and the normal distribution for g(y; β). For both approaches, nuisances are estimated through a gradient boosting model XGBoost [11], which is known to be flexible.