Learning to Estimate Shapley Values with Vision Transformers

Authors: Ian Connick Covert, Chanwoo Kim, Su-In Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments compare Shapley values to many baseline methods (e.g., attention rollout, Grad CAM, LRP), and we find that our approach provides more accurate explanations than existing methods for Vi Ts.
Researcher Affiliation Academia Ian Covert , Chanwoo Kim & Su-In Lee Paul G. Allen School of Computer Science & Engineering University of Washington {icovert,chanwkim,suinlee}@cs.washington.edu
Pseudocode Yes Algorithm 1: Explainer training Input: Coalitional game vxy(s), learning rate α Output: Explainer ϕVi T(x, y; θ) initialize ϕVi T(x, y; θ) while not converged do sample (x, y) p(x, y), s p Sh(s) predict ϕ ϕVi T(x, y; θ) set ϕ ϕ + d 1 vxy(1) vxy(0) 1 ϕ calculate L vxy(s) vxy(0) s ϕ 2 update θ θ α θL end
Open Source Code Yes 2 https://github.com/suinleelab/vit-shapley
Open Datasets Yes Our experiments are based on three image datasets: Image Nette, a natural image dataset consisting of ten Image Net classes (Howard and Gugger, 2020; Deng et al., 2009), MURA, a medical image dataset of musculoskeletal radiographs classified as normal or abnormal (Rajpurkar et al., 2017), and the Oxford IIIT-Pets dataset, which has 37 classes (Parkhi et al., 2012).
Dataset Splits Yes The Image Nette dataset contains 9,469 training examples and 3,925 validation examples, and we split the validation data to obtain validation and test sets containing 1,962 examples each. The MURA dataset contains 36,808 training examples and 3,197 validation examples. We use the validation examples as a test set, and we split the training examples to obtain train and validation sets containing 33,071 and 3,737 examples, ensuring that images from the same patient belong to a single split. The Oxford-IIIT Pets dataset contains 7,349 examples for 37 classes, and we split the data to obtain train, validation, and test sets containing 5,879, 735, and 735 examples, respectively.
Hardware Specification Yes We used a machine with 2 Ge Force RTX 2080Ti GPUs to train the explainer model
Software Dependencies No The paper mentions software like PyTorch Lightning, Captum, and Seaborn, but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch Lightning 1.x' or 'Captum 0.x').
Experiment Setup Yes When training the original classifier and fine-tuned classifier models, we used a learning rate of 10 5 and trained for 25 epochs and 50 epochs, respectively. (...) We used the Adam W optimizer (Loshchilov and Hutter, 2018) with a cosine learning rate schedule and a maximum learning rate of 10 4, and we trained the model for 100 epochs, selecting the best model based on the validation loss. We used standard data augmentation steps: random resized crops, vertical flips, horizontal flips, and color jittering including brightness, contrast, saturation, and hue. We used minibatches of size 64 with 32 subset samples s per x sample, and we found that using a tanh nonlinearity on the explainer predictions was helpful to stabilize training.