Robust Models Are More Interpretable Because Attributions Look Normal
Authors: Zifan Wang, Matt Fredrikson, Anupam Datta
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With both analytical (Sec. 3) and empirical (Sec. 5) results, we show that the gradient of the model with respect to its input... We empirically demonstrate that one such type of boundary attribution, called Boundary-based Integrated Gradients (BIG), produces explanations that are more accurate than prior attribution methods (relative to ground-truth bounding box information), while mitigating the problem of baseline sensitivity that is known to impact applications of Integrated Gradients (Sundararajan et al., 2017) (Section 6)." and "5. Evaluation |
| Researcher Affiliation | Academia | Zifan Wang 1 Matt Fredrikson 1 Anupam Datta 1 1Carnegie Mellon University, Pittsburgh, PA 15213, USA. Correspondence to: Zifan Wang <zifan@cmu.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code can be found at https://github. com/zifanw/boundary. |
| Open Datasets | Yes | We conduct experiments over two data distributions, Image Net (Russakovsky et al., 2015) and CIFAR-10 (Krizhevsky et al.). |
| Dataset Splits | No | The paper uses pre-trained models and evaluates on specific subsets of correctly-classified images from Image Net (1500) and CIFAR-10 (5000), but does not provide details on training, validation, or test splits for reproducing model training from scratch. |
| Hardware Specification | Yes | All computations are done using a GPU accelerator Titan RTX with a memory size of 24 GB. |
| Software Dependencies | Yes | All attributions are implemented with Captum (Kokhlikyan et al., 2020) and visualized with Trulens (Leino et al., 2021a). The implementation of PGDs and CW are based on Foolbox (Rauber et al., 2020; 2017) and the implementation of Auto PGD is based on the authors public repository (we only use apgd-ce and apgd-dlr losses for efficiency reasons). |
| Experiment Setup | Yes | Implementation details of the boundary search (by ensembling the results of PGD, CW and Auto PGD) and the hyperparameters used in our experiments, are included in Appendix B.2. Hyper-parameters for each attack can be found in Table 7. The details of our implementation are discussed in Section 5, where we show that this yields good results in practice. |