Scale Alone Does not Improve Mechanistic Interpretability in Vision Models

Authors: Roland S. Zimmermann, Thomas Klein, Wieland Brendel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a large-scale psychophysical study (see Fig. 1) to investigate the effects of scale and other design choices and find no practically relevant differences between any of the investigated models.
Researcher Affiliation Academia 1 Max Planck Institute for Intelligent Systems, Tübingen AI Center, Tübingen, Germany 2 University of Tübingen, Tübingen AI Center, Tübingen, Germany.
Pseudocode No No pseudocode or algorithm blocks are present in the paper. Methodological steps are described in paragraph form.
Open Source Code Yes Code & Dataset: brendel-group.github.io/imi.
Open Datasets Yes We investigate nine computer vision models compatible with Image Net classification [42]...pre-training on 400 million LAION [45] samples.
Dataset Splits Yes We investigate nine computer vision models compatible with Image Net classification [42]...While models vary widely in terms of their classification performance ( 60 % to 85 %), their interpretability varies in a much narrower range for each method (see Fig. 4a (Left)). For this, we use two metrics: For one, the model s classification performance on Image Net, for another, a measure of consistency between a model s and human decisions [14]. We compute the average local contrast in the activation maps caused by validation set images for the sampled units of the investigated convolutional networks.
Hardware Specification Yes We record the activations on Nvidia 2080Ti GPUs and perform multiple forward passes due to memory constraints, but even if we assume a pessimistic 4 hours of GPU time and full utilization of the GPU at 250 W, this results in 9 k Wh power consumption for all models in total. Creating feature visualizations for 100 randomly selected units we later randomly sample 84 units for each model and kept some stimuli for anticipated later experiments requires the parallel use of 25 2080Ti GPUs for about 12 hours for all models except Conv Ne Xt, which takes about 24 hours on average.
Software Dependencies No No specific software dependencies with version numbers are explicitly mentioned in the paper.
Experiment Setup Yes We conduct a large-scale psychophysical study (see Fig. 1) to investigate the effects of scale and other design choices... As interpretability is a human-centric model attribute, we perform a large-scale psychophysical experiment to measure the interpretability of models and individual units. For this, we use the experimental paradigm proposed by Borowski et al. [4] and Zimmermann et al. [54]: Here, the ability of humans to predict the sensitivity of units is used to measure interpretability. Specifically, crowd workers on Amazon Mechanical Turk complete a series of 2-Alternative-Forced-Choice (2-AFC) tasks (see Fig. 2 for an illustration). In each task, they are presented with a pair of strongly and weakly activating (query) images for a specific unit and are asked to identify the strongly activating one. During this task, they are supported by 18 explanatory (reference) images...We begin by making the task as easy as possible by choosing the query images as the most/least activating samples from the entire Image Net dataset...Once they have successfully solved the practice trials, they are admitted to the main experiment, in which they see 40 real trials interspersed with five fairly obvious catch-trials.