reproducibilityindex.ai

SHAP-IQ: Unified Approximation of any-order Shapley Interactions

Authors: Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models. We use SHAP-IQ to compute any-order n-Shapley Values on different ML models and demonstrate that it outperforms existing baseline methods. We conduct multiple experiments to illustrate the approximation quality of SHAP-IQ compared to current baseline approaches.
Researcher Affiliation	Academia	Fabian Fumagalli Bielefeld University, CITEC D-33619, Bielefeld, Germany ffumagalli@techfak.uni-bielefeld.de Maximilian Muschalik LMU Munich, MCML Munich D-80539, Munich, Germany maximilian.muschalik@ifi.lmu.de Patrick Kolpaczki Paderborn University D-33098, Paderborn, Germany patrick.kolpaczki@upb.de Eyke Hüllermeier LMU Munich, MCML Munich D-80539, Munich, Germany eyke@ifi.lmu.de Barbara Hammer Bielefeld University, CITEC D-33619, Bielefeld, Germany bhammer@techfak.uni-bielefeld.de
Pseudocode	Yes	Algorithm 1 SHAP-IQ for any-order interactions Ss0 up to order s0, Algorithm 2 Determine the the sampling order k0 for the deterministic part, Algorithm 3 Sample a subset T p(T), Algorithm 4 Welford Algorithm for Mean and Variance [45].
Open Source Code	Yes	2The shapiq package extends on the well-known shap library and can be found at https://pypi.org/ project/shapiq/.
Open Datasets	Yes	For a language model (LM), we use a fine-tuned version of the Distil BERT transformer architecture [34] on movie review sentences from the original IMDB dataset [27, 22] for sentiment analysis... For an image classification model (ICM), we use ResNet18 [19] pre-trained on Image Net [11] as provided by torch [30].
Dataset Splits	No	The paper describes using pre-trained models (Distil BERT fine-tuned on IMDB, ResNet18 pre-trained on ImageNet) and randomly sampling instances for explanation. It does not provide specific train/validation/test splits for its own experimental setup.
Hardware Specification	Yes	The experiments concerning the approximation quality of SHAP-IQ compared to the baselines were run on an computation cluster on hyperthreaded Intel Xeon E5-2697 v3 CPUs clocking at with 2.6Ghz. Before running the experiments on the cluster, the implementations were validated on a Dell XPS 15 9510 containing an Intel i7-11800H at 2.30GHz.
Software Dependencies	No	No specific version numbers for key software dependencies like PyTorch, transformers, or scikit-image are explicitly provided in the text, only their names and citations.
Experiment Setup	Yes	We randomly sample 50 reviews of length d = 14 and explain each model prediction. To obtain the prediction of different coalitions, we pre-compute super-pixels with SLIC [1, 44] to obtain a function on d = 14 features and apply mean imputation on absent features. For a high-dimensional synthetic model with d = 30, we use a sum of unanimity model (SOUM) ν(T) := PN n=1 an1(Qn T), where N = 50 interaction subsets Q1, . . . , QN D are chosen uniformly from all subset sizes and a1, . . . , a N R are generated uniformly an unif([0, 1]). For each iteration we evaluate the approximation method with different budgets up to a maximum budget of 2^14 model evaluations.