Don’t trust your eyes: on the (un)reliability of feature visualizations
Authors: Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to explain how neural networks process natural images. This can be used as a sanity check for feature visualizations. We underpin our empirical findings by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Max Planck Institute for Intelligent Systems 3T ubingen AI Center 4Department of Statistical Sciences, University of Toronto. |
| Pseudocode | No | The paper does not contain any explicitly labeled “Pseudocode” or “Algorithm” blocks or figures. |
| Open Source Code | Yes | Code to replicate experiments from this paper is available here: https://github.com/google-research/ fooling-feature-visualizations/ |
| Open Datasets | Yes | For classifier training, we create a dataset by combining 1, 281, 167 images from the training set of the Image Net 2012 dataset and 472, 500 synthetic images. |
| Dataset Splits | Yes | This can be verified by checking the network s validation accuracy on Image Net-1K, which only minimally changes when deceiving all visualizations in the last layer of Inception-V1 (top-1 accuracy changes from 69.146 % to 68.744 %; top-5 from 88.858 % to 88.330 %).; natural Image Net validation images |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory specifications) used for running experiments are provided in the paper. The paper states: “There are no special compute requirements (e.g., we do not train large models).” |
| Software Dependencies | Yes | Throughout the paper, feature visualizations were generated using the lucent library (Greentfrapp, v0.1.8), version v0.1.8. |
| Experiment Setup | Yes | We train a model implementing the simple six layer CNN architecture displayed in Table 3 for 8 epochs on the aforementioned dataset with an SGD optimizer using a learning rate of 0.01, momentum of 0.9 and weight decay of 0.00005.; For Figure 1, we used the thresholds that visually looked best... specifically thresholds=(512,512,512,6,32,6) for the six visualizations from left to right... transforms=lucent.optvis.transform.standard_transforms + [center_crop(224, 224)] was used. The image was parameterized via param_f=lambda: lucent.optvis.param.image(224, batch=1). |