Double Machine Learning Density Estimation for Local Treatment Effects with Instruments

Authors: Yonghan Jung, Jin Tian, Elias Bareinboim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The use of the proposed methods is illustrated through both synthetic and a real dataset called 401(k).We illustrate the proposed methods on synthetic and real data.
Researcher Affiliation Academia Yonghan Jung Purdue University jung222@purdue.edu, Jin Tian Iowa State University jtian@iastate.edu, Elias Bareinboim Columbia University eb@cs.columbia.edu
Pseudocode No The paper describes algorithmic steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide any explicit statements about making its source code available or links to a code repository.
Open Datasets Yes In our analysis, we used the dataset introduced by [2] containing 9275 individuals, which has been studied in [2, 17, 5, 47, 58, 64], to cite a few. [2] A. Abadie. Semiparametric instrumental variable estimation of treatment response models. Journal of econometrics, 113(2):231 263, 2003.
Dataset Splits No The paper mentions 'randomly split halves of the samples' for the DML cross-fitting technique and 'separate validation data or applying cross-validation' for model selection, but it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using 'XGBoost [11]' for nuisance estimation but does not provide specific version numbers for XGBoost or any other software dependencies.
Experiment Setup Yes We use the Gaussian kernel. The bandwidth is set to h = 0.5n 1/5. In estimating the density, we choose 200 equi-spaced points {y(i)}200 i=1 in Y and evaluate both estimators at Kh,y(i) for i = 1, , 200. We use KL divergence for Df and the normal distribution for g(y; β). For both approaches, nuisances are estimated through a gradient boosting model XGBoost [11], which is known to be flexible.