Provably Better Explanations with Optimized Aggregation of Feature Attributions
Authors: Thomas Decker, Ananta R. Bhattarai, Jindong Gu, Volker Tresp, Florian Buettner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines. |
| Researcher Affiliation | Collaboration | 1LMU Munich 2Siemens AG 3Technical University of Munich 4University of Oxford 5Munich Center for Machine Learning (MCML) 6Goethe University Frankfurt 7German Cancer Research Center (DKFZ). |
| Pseudocode | No | The paper describes algorithms and methods in text and mathematical formulas but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Accompanying source code is released at https://github.com/thomdeck/aggopt. |
| Open Datasets | Yes | The findings presented in this section are based on the Image Net ILSVRC2012 dataset and concrete implementation details are documented in Appendix C. To substantiate the findings in the main paper, we repeated the experiments in section 5.1 on four additional datasets, namely CIFAR10 as well as three medical image datasets Blood MNIST, Derma MNIST and Path MNIST (Yang et al., 2023). |
| Dataset Splits | Yes | All our aggregation strategies are optimized using only a small amount of metric evaluation samples to approximate the underlying metric (magg = 50). We explicitly test how well the improvements generalize to a larger sample of novel metric evaluations (meval = 200) and if they transfer to alternative quality measures. |
| Hardware Specification | Yes | In Table 7 we report the time required to retrieve optimal aggregation weights across 7 explainers for different models evaluated on an NVIDIA RTX A5000 GPU and averaged over 100 samples with corresponding standard deviations: |
| Software Dependencies | No | The paper mentions various software components like torchvison, timm library, Quantus, Open XAI, pytorch-gradcam, and cvxpy, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | All our aggregation strategies are optimized by estimating the underlying L2 metric using magg metric evaluation samples yielding \ SENSAVG and [ INFD . In particular, we have: \ SENSAVG(ϕω) = 1/m_agg * sum_j=1^m_agg ||ϕω(x) - ϕω(x + ε(j))||^2_2 and [ INFD(ϕω) = 1/m_agg * sum_j=1^m_agg (I(j)^T ϕω(x) - (f(x) - f(h(x, xb, I(j)))))^2. To optimize the aggregation weights for AGGrobust and AGGopt the expectation is estimated using only magg samples for ε. For AGGopt we additionally normalized both metrics using the Frobenius norm of the respective parameter matrix Γ^T Γ_F to ensure comparability between the two considered metrics. In particular, we utilized binary perturbations I {0, 1}d that randomly select an image area of 20%. |