Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Global Feature Contributions With Additive Importance Measures

Authors: Ian Covert, Scott M. Lundberg, Su-In Lee

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that SAGE can be calculated efficiently and that it assigns more accurate importance values than other methods. 5 Experiments We now evaluate SAGE by comparing it with several baseline methods. For simplicity we only consider model-agnostic baselines, including permutation tests, mean importance, feature ablation and univariate predictors (see Section 2.3). For datasets, we used MNIST [19], a bike sharing demand dataset [10], the German credit quality dataset [21], the Portuguese bank marketing dataset [26], and a breast cancer (BRCA) subtype classification dataset [4, 39].
Researcher Affiliation Collaboration Ian C. Covert University of Washington Seattle, WA EMAIL Scott Lundberg Microsoft Research Redmond, WA EMAIL Su-In Lee University of Washington Seattle, WA EMAIL
Pseudocode Yes Supplement D describes the SAGE sampling algorithm (Algorithm 1) and the changes to its properties in more detail.
Open Source Code Yes 1http://github.com/iancovert/sage/
Open Datasets Yes For datasets, we used MNIST [19], a bike sharing demand dataset [10], the German credit quality dataset [21], the Portuguese bank marketing dataset [26], and a breast cancer (BRCA) subtype classification dataset [4, 39].
Dataset Splits Yes For datasets, we used MNIST [19]... Figure 3: Identifying corrupted features with SAGE. ... Top right: SAGE comparison to identify corruption in month feature. Validation Month +1
Hardware Specification No The paper describes the datasets and models used (e.g., XGBoost, CatBoost, MLP) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud resources) on which the experiments were run.
Software Dependencies No The paper mentions specific software libraries like 'XGBoost [8]', 'Cat Boost [29]', and 'regularized logistic regression', but it does not provide specific version numbers for these software dependencies.
Experiment Setup No The paper mentions the models used for each dataset (e.g., 'XGBoost for the bike data', 'Cat Boost for the bank and credit data', 'regularized logistic regression for the BRCA data', 'multi-layer perceptron (MLP) for MNIST') but does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or training schedules.