On the Relationship Between Explanation and Prediction: A Causal View
Authors: Amir-Hossein Karimi, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Y s. Our results suggest that the relationships between E and Y is far from ideal. In fact, the gap between ideal case only increase in higher-performing models models that are likely to be deployed. Our work is a promising first step towards providing a quantitative measure of the relationship between E and Y , which could also inform the future development of methods for E with a quantitative metric. |
| Researcher Affiliation | Collaboration | 1MPI for Intelligent Systems 2ETH Zurich 3Google Research, Brain Team 4CISPA-Helmholtz Center for Information Security. Correspondence to: Amir-Hossein Karimi <amir@tue.mpg.de>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that "All methods are openly accessible here: https://github.com/PAIR-code/saliency" (Footnote 4), but this refers to the code for the third-party saliency methods used, not the authors' own implementation of their causal analysis methodology. |
| Open Datasets | Yes | We use the dataset provided by Unterthiner et al. (2020), a large collection of existing models that have already been trained with pre-specified hyperparameters (see Section 3.1 for more detail). ... and the models are trained on commonly used CIFAR10, SVHN, MNIST, and FASHION MNIST datasets. |
| Dataset Splits | No | The paper mentions using models evaluated based on "test accuracy" and refers to "test accuracy boundaries" in Table 2, but it does not explicitly provide details on training, validation, or test dataset splits (e.g., specific percentages or sample counts for each partition) used in the original creation of the model zoo, nor for its own analysis. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU or CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in their experimental setup. |
| Experiment Setup | Yes | The set of hyperparameters considered include the choice of optimizer, w0 type, w0 std., b0 type, choice of activation function, learning rate, ℓ2 regularization, dropout strength, and dataset split (see Unterthiner et al., 2020, Appendix A.2). ... The following markers are used for (log-)rounding continuous features: ℓ2 reg.: [1e 8, 1e 6, 1e 4, 1e 2], dropout: [0, 0.2, 0.45, 0.7], w0 std.: [1e 3, 1e 2, 1e 1, 0.5], learning rate: [5e 4, 5e 3, 5e 2]. |