FastSHAP: Real-Time Shapley Value Estimation

Authors: Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, Rajesh Ranganath

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments with tabular and image datasets, we compare Fast SHAP to existing estimation approaches and find that it generates accurate explanations with an orders-of-magnitude speedup.
Researcher Affiliation Academia Neil Jethani New York University Mukund Sudarshan New York University Ian Covert University of Washington Su-In Lee University of Washington Rajesh Ranganath New York University
Pseudocode Yes Algorithm 1: Fast SHAP training
Open Source Code Yes Code to implement Fast SHAP is available online in two separate repositories: https://github. com/iancovert/fastshap contains a Py Torch implementation and https://github. com/neiljethani/fastshap/ a Tensor Flow implementation, both with examples of tabular and image data experiments.
Open Datasets Yes Our experiments use data from a 1994 United States census, a bank marketing campaign, bankruptcy statistics, and online news articles (Dua and Graff, 2017). The census data contains 12 input features, and the binary label indicates whether a person makes over $50K a year (Kohavi et al., 1996). The marketing dataset contains 17 input features, and the label indicates whether the customer subscribed to a term deposit (Moro et al., 2014). The bankruptcy dataset contains 96 features describing various companies and whether they went bankrupt (Liang et al., 2016). The news dataset contains 60 numerical features about articles published on Mashable, and our label indicates whether the share count exceeds the median number (1400) (Fernandes et al., 2015). The datasets were each split 80/10/10 for training, validation and testing.
Dataset Splits Yes The datasets were each split 80/10/10 for training, validation and testing.
Hardware Specification Yes The image experiments were run using 8 cores of an Intel Xeon Gold 6148 processor and a single NVIDIA Tesla V100.
Software Dependencies No The paper mentions software like PyTorch, TensorFlow, Light GBM, XGBoost, shap package, and tf-explain package, but does not specify their version numbers.
Experiment Setup Yes The models are trained using Adam with a learning rate of 10 3, and we use a learning rate scheduler that multiplies the learning rate by a factor of 0.5 after 3 epochs of no validation loss improvement. Early stopping was triggered after the validation loss ceased to improve for 10 epochs. Each model is trained using Adam with a learning rate of 10 3, and we use a learning rate scheduler that multiplies the learning rate by a factor of 0.8 after 3 epochs of no validation loss improvement. Early stopping was triggered after the validation loss ceased to improve for 20 epochs.