Efficient nonparametric statistical inference on population feature importance using Shapley values

Authors: Brian Williamson, Jean Feng

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the validity of our approach in a simulation study and estimate the SPVIM of hospital measurements for predicting mortality in the intensive care unit (ICU). All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Researcher Affiliation Academia 1Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 2Department of Biostatistics, University of Washington, Seattle, WA.
Pseudocode Yes Algorithm 1 Estimation of SPVIM
Open Source Code Yes All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Open Datasets Yes We now analyze data on patients stays in the ICU from the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database (Silva et al., 2012).
Dataset Splits Yes An alternative approach is to perform K-fold cross-fitting, where we partition the data into K subsets of roughly equal size and, for each k {1, . . . , K}, construct an estimator fk,n,s based on all the data except for the kth subset. ... the combination of these parameters was tuned using five-fold cross-validation to minimize the mean squared error (MSE).
Hardware Specification Yes All analyses were performed on a computer cluster with 32-core CPU nodes with 64 GB RAM.
Software Dependencies No The paper mentions software like 'xgboost', 'Python package vimpy', 'R package vimp', and 'Adam' optimizer, but it does not specify version numbers for these software dependencies.
Experiment Setup Yes To obtain each fn,s we fit boosted trees... with maximum tree depth equal to one, learning rate equal to 10 2, and ℓ2regularization parameter equal to zero. The number of trees varied among {50, 100, 250, 500, 1000, . . . , 3000} and the ℓ1-regularization parameter varied among {10 3, 10 2, 0.1, 1, 5, 10}.