DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning

Authors: Siqi Xu, Lin Liu, Zhonghua Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive synthetic experiments are conducted to support our findings and also expose the gap between theory and practice. As a proof of concept, we apply Deep Med to analyze two real datasets on machine learning fairness and reach conclusions consistent with previous findings.
Researcher Affiliation Academia Siqi Xu Department of Statistics and Actuarial Sciences University of Hong Kong Hong Kong SAR, China sqxu@hku.hk Lin Liu Institute of Natural Sciences, MOE-LSC, School of Mathematical Sciences, CMA-Shanghai, and SJTU-Yale Joint Center for Biostatistics and Data Science Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory Shanghai, China linliu@sjtu.edu.cn Zhonghua Liu Department of Biostatistics Columbia University New York, NY, USA zl2509@cumc.columbia.edu
Pseudocode Yes Algorithm 1 Deep Med with V -fold cross-fitting
Open Source Code Yes Finally, a user-friendly R package can be found at https://github.com/siqixu/Deep Med.
Open Datasets No The paper mentions using synthetic data, which is generated, and analyzing real data from the COMPAS algorithm, citing Dressel and Farid (2018). However, it does not provide a direct link, DOI, or specific repository name for the COMPAS dataset itself, nor does it describe how to access or generate the synthetic data.
Dataset Splits Yes We adopt a 3-fold cross-validation to choose the hyperparameters for DNNs (depth L, width K, L1-regularization parameter λ and epochs), RF (number of trees and maximum number of nodes) and GBM (numbers of trees and depth). We use a completely independent sample for the hyperparameter selection.
Hardware Specification No The authors would also like to thank Department of Statistics and Actuarial Sciences at The University of Hong Kong for providing highperformance computing servers that supported the numerical experiments in this paper. This statement is too general and does not include specific hardware models (e.g., GPU, CPU models, or memory details).
Software Dependencies No The Lasso is implemented using the R package hdm with a data-driven penalty. The DNN, RF and GBM are implemented using the R packages keras , random Forest and gbm , respectively. No specific version numbers for these R packages (hdm, keras, randomForest, gbm) are provided.
Experiment Setup Yes We adopt a 3-fold cross-validation to choose the hyperparameters for DNNs (depth L, width K, L1-regularization parameter λ and epochs), RF (number of trees and maximum number of nodes) and GBM (numbers of trees and depth)... We use the cross-entropy loss for the binary response and the mean-squared loss for the continuous response. We fix the batch-size as 100 and the other hyperparameters for the other methods are set to the default values in their R packages.