Learning to Estimate Shapley Values with Vision Transformers
Authors: Ian Connick Covert, Chanwoo Kim, Su-In Lee
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments compare Shapley values to many baseline methods (e.g., attention rollout, Grad CAM, LRP), and we find that our approach provides more accurate explanations than existing methods for Vi Ts. |
| Researcher Affiliation | Academia | Ian Covert , Chanwoo Kim & Su-In Lee Paul G. Allen School of Computer Science & Engineering University of Washington {icovert,chanwkim,suinlee}@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1: Explainer training Input: Coalitional game vxy(s), learning rate α Output: Explainer ϕVi T(x, y; θ) initialize ϕVi T(x, y; θ) while not converged do sample (x, y) p(x, y), s p Sh(s) predict ϕ ϕVi T(x, y; θ) set ϕ ϕ + d 1 vxy(1) vxy(0) 1 ϕ calculate L vxy(s) vxy(0) s ϕ 2 update θ θ α θL end |
| Open Source Code | Yes | 2 https://github.com/suinleelab/vit-shapley |
| Open Datasets | Yes | Our experiments are based on three image datasets: Image Nette, a natural image dataset consisting of ten Image Net classes (Howard and Gugger, 2020; Deng et al., 2009), MURA, a medical image dataset of musculoskeletal radiographs classified as normal or abnormal (Rajpurkar et al., 2017), and the Oxford IIIT-Pets dataset, which has 37 classes (Parkhi et al., 2012). |
| Dataset Splits | Yes | The Image Nette dataset contains 9,469 training examples and 3,925 validation examples, and we split the validation data to obtain validation and test sets containing 1,962 examples each. The MURA dataset contains 36,808 training examples and 3,197 validation examples. We use the validation examples as a test set, and we split the training examples to obtain train and validation sets containing 33,071 and 3,737 examples, ensuring that images from the same patient belong to a single split. The Oxford-IIIT Pets dataset contains 7,349 examples for 37 classes, and we split the data to obtain train, validation, and test sets containing 5,879, 735, and 735 examples, respectively. |
| Hardware Specification | Yes | We used a machine with 2 Ge Force RTX 2080Ti GPUs to train the explainer model |
| Software Dependencies | No | The paper mentions software like PyTorch Lightning, Captum, and Seaborn, but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch Lightning 1.x' or 'Captum 0.x'). |
| Experiment Setup | Yes | When training the original classifier and fine-tuned classifier models, we used a learning rate of 10 5 and trained for 25 epochs and 50 epochs, respectively. (...) We used the Adam W optimizer (Loshchilov and Hutter, 2018) with a cosine learning rate schedule and a maximum learning rate of 10 4, and we trained the model for 100 epochs, selecting the best model based on the validation loss. We used standard data augmentation steps: random resized crops, vertical flips, horizontal flips, and color jittering including brightness, contrast, saturation, and hue. We used minibatches of size 64 with 32 subset samples s per x sample, and we found that using a tanh nonlinearity on the explainer predictions was helpful to stabilize training. |