A Consistent and Efficient Evaluation Strategy for Attribution Methods
Authors: Yao Rong, Tobias Leemann, Vadim Borisov, Gjergji Kasneci, Enkelejda Kasneci
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we aim to overcome these shortcomings and make the evaluation more consistent and efficient. To this end, we propose a new debiased strategy that compensates for confounders causing inconsistencies. Furthermore, we show that in the debiased setting, we can skip the retraining without significant changes in the results. This results in drastic efficiency gains as shown in the lower part of Figure 1. We argue that it is crucial for the community to have sound evaluation strategies that do not suffer from limited accessibility due the required compute capacity. Specifically, we make the following contributions: ... We examine the mechanisms underlying the evaluation strategies based on perturbation by conducting a rigorous information-theoretic analysis, and formally reveal that results can be significantly confounded. To compensate for this confounder, we propose the Noisy Linear Imputation strategy and empirically prove its efficiency and effectiveness. The proposed strategy significantly decreases the sensitivity to hyperparameters such as the removal order. We generalize our findings to a novel evaluation strategy, ROAD (Rem Ove And Debias), which can be used to objectively and efficiently evaluate several attribution methods. Compared to previous evaluation strategies requiring retraining, e.g., Remove and Retrain (ROAR) (Hooker et al., 2019), ROAD saves 99 % of the computational costs. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of T ubingen, T ubingen, Germany. Correspondence to: Yao Rong <yao.rong@uni-tuebingen.de>, Tobias Leemann <tobias.leemann@uni-tuebingen.de>. |
| Pseudocode | No | The paper describes methods in prose and mathematical terms but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our source code at https://github.com/ tleemann/road_evaluation. |
| Open Datasets | Yes | To empirically confirm our findings, we performed experiments on CIFAR-10 (Krizhevsky et al., 2009). ... We also use Food-101 (Bossard et al., 2014), a large-scale dataset of high-resolution images, to validate the generalizability of our method. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and Food-101 and achieving a test set accuracy, but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit predefined split references). |
| Hardware Specification | Yes | In our experiments, evaluation using the ROAD took only 0.7 % of the resources required for ROAR, as given by the runtimes in Table 4 obtained on the same hardware (single Nvidia GTX 2080Ti and 8 Cores). |
| Software Dependencies | No | The paper mentions using ResNet models and SGD optimizer but does not specify software dependencies with version numbers (e.g., PyTorch version, Python version, etc.). |
| Experiment Setup | Yes | The model is trained with the initial learning rate of 0.01 and the SGD optimizer (Sutskever et al., 2013). We decrease the learning rate by factor 0.1 after 25 and train the model for 40 epochs on one GPU. ... The learning rate was reduced by factor of 0.1 after every 10 epochs. In total, we trained 40 epochs with a batch size of 32. |