Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Decoupling Pixel Flipping and Occlusion Strategy for Consistent XAI Benchmarks
Authors: Stefan Bluecher, Johanna Vielhaben, Nils Strodthoff
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This study proposes two complementary perspectives to resolve this disagreement problem. Firstly, we address the common criticism of occlusion-based XAI, that artificial samples lead to unreliable model evaluations. We propose to measure the reliability by the R(eference)Out-of-Model-Scope (OMS) score. The R-OMS score enables a systematic comparison of occlusion strategies and resolves the disagreement problem by grouping consistent PF rankings. Secondly, we show that the insightfulness of MIF and LIF is conversely dependent on the R-OMS score. To leverage this, we combine the MIF and LIF measures into the symmetric relevance gain (SRG) measure. This breaks the inherent connection to the underlying occlusion strategy and leads to consistent rankings. This resolves the disagreement problem of PF benchmarks, which we verify for a set of 40 different occlusion strategies. |
| Researcher Affiliation | Academia | Stefan Blücher EMAIL BIFOLD Berlin Institute for the Foundations of Learning and Data Machine Learning Group, TU Berlin Johanna Vielhaben EMAIL Explainable Artificial Intelligence Group Fraunhofer Heinrich-Hertz-Institute Nils Strodthoff EMAIL Division AI4Health Carl von Ossietzky Universität Oldenburg |
| Pseudocode | No | The paper describes methods and measures in narrative text and mathematical equations (e.g., Equation (1), Equation (2), Equation (3), Equation (4), Equation (5)) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available on at https://github.com/bluecher31/pixel-flipping. |
| Open Datasets | Yes | All results are based on 100 randomly selected imagenet samples. |
| Dataset Splits | No | All results are based on 100 randomly selected imagenet samples. Based on Section 3, we construct a diverse set of 40 occlusion strategies, varying all design choices (n: 25, 100, 500, 5000; imputer: mean, train set, histogram, cv2, diffusion; model: standard-Res Net50, timm-Res Net50). This text describes the samples used for evaluation but does not specify train/test/validation splits for model training or how the 100 samples are used in relation to potential splits if the models were trained on ImageNet. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | Gradient-based attributions are calculated using captum (Kokhlikyan et al., 2020), LRP using zennit (Anders et al., 2021). These software libraries are mentioned without specific version numbers. |
| Experiment Setup | Yes | Setup This section explores the impact of different occlusion strategies on PF benchmarks. All results are based on 100 randomly selected imagenet samples. Based on Section 3, we construct a diverse set of 40 occlusion strategies, varying all design choices (n: 25, 100, 500, 5000; imputer: mean, train set, histogram, cv2, diffusion; model: standard-Res Net50, timm-Res Net50). |