Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient nonparametric statistical inference on population feature importance using Shapley values

Authors: Brian Williamson, Jean Feng

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the validity of our approach in a simulation study and estimate the SPVIM of hospital measurements for predicting mortality in the intensive care unit (ICU). All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Researcher Affiliation Academia 1Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 2Department of Biostatistics, University of Washington, Seattle, WA.
Pseudocode Yes Algorithm 1 Estimation of SPVIM
Open Source Code Yes All numerical results can be replicated using code available on Git Hub at bdwilliamson/spvim_supplementary; the proposed methods are also implemented in the Python package vimpy and the R package vimp.
Open Datasets Yes We now analyze data on patients stays in the ICU from the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database (Silva et al., 2012).
Dataset Splits Yes An alternative approach is to perform K-fold cross-๏ฌtting, where we partition the data into K subsets of roughly equal size and, for each k {1, . . . , K}, construct an estimator fk,n,s based on all the data except for the kth subset. ... the combination of these parameters was tuned using ๏ฌve-fold cross-validation to minimize the mean squared error (MSE).
Hardware Specification Yes All analyses were performed on a computer cluster with 32-core CPU nodes with 64 GB RAM.
Software Dependencies No The paper mentions software like 'xgboost', 'Python package vimpy', 'R package vimp', and 'Adam' optimizer, but it does not specify version numbers for these software dependencies.
Experiment Setup Yes To obtain each fn,s we ๏ฌt boosted trees... with maximum tree depth equal to one, learning rate equal to 10 2, and โ„“2regularization parameter equal to zero. The number of trees varied among {50, 100, 250, 500, 1000, . . . , 3000} and the โ„“1-regularization parameter varied among {10 3, 10 2, 0.1, 1, 5, 10}.