reproducibilityindex.ai

Efficient nonparametric statistical inference on population feature importance using Shapley values

Authors: Brian Williamson, Jean Feng

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the validity of our approach in a simulation study and estimate the SPVIM of hospital measurements for predicting mortality in the intensive care unit (ICU). All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Researcher Affiliation	Academia	1Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 2Department of Biostatistics, University of Washington, Seattle, WA.
Pseudocode	Yes	Algorithm 1 Estimation of SPVIM
Open Source Code	Yes	All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Open Datasets	Yes	We now analyze data on patients stays in the ICU from the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database (Silva et al., 2012).
Dataset Splits	Yes	An alternative approach is to perform K-fold cross-ﬁtting, where we partition the data into K subsets of roughly equal size and, for each k {1, . . . , K}, construct an estimator fk,n,s based on all the data except for the kth subset. ... the combination of these parameters was tuned using ﬁve-fold cross-validation to minimize the mean squared error (MSE).
Hardware Specification	Yes	All analyses were performed on a computer cluster with 32-core CPU nodes with 64 GB RAM.
Software Dependencies	No	The paper mentions software like 'xgboost', 'Python package vimpy', 'R package vimp', and 'Adam' optimizer, but it does not specify version numbers for these software dependencies.
Experiment Setup	Yes	To obtain each fn,s we ﬁt boosted trees... with maximum tree depth equal to one, learning rate equal to 10 2, and ℓ2regularization parameter equal to zero. The number of trees varied among {50, 100, 250, 500, 1000, . . . , 3000} and the ℓ1-regularization parameter varied among {10 3, 10 2, 0.1, 1, 5, 10}.