Measuring Per-Unit Interpretability at Scale Without Humans

Authors: Roland S. Zimmermann, David Klindt, Wieland Brendel

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate its predictive power through an interventional human psychophysics study. We demonstrate the usefulness of this measure by performing previously infeasible experiments: (1) A large-scale interpretability analysis across more than 70 million units from 835 computer vision models, and (2) an extensive analysis of how units transform during training.
Researcher Affiliation Academia Roland S. Zimmermann MPI-IS, Tübingen AI Center David Klindt Stanford Wieland Brendel MPI-IS, Tübingen AI Center
Pseudocode No The paper provides mathematical equations for the Machine Interpretability Score (MIS) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Online version, code and interactive visualizations available at brendel-group.github.io/mis.
Open Datasets Yes Both query images and explanations are chosen from the training set of Image Net-2012 [40].
Dataset Splits No The paper mentions using the 'training set of Image Net-2012' for query images and explanations, and refers to a 'training recipe' for a ResNet-50, but it does not explicitly state the specific percentages or counts for training, validation, and test splits for its experiments.
Hardware Specification Yes Evaluating all units of a model takes, on average and varying depending on the model’s size, less than one hour on a GPU (e.g., NVIDIA RTX 2080-TI or V100).
Software Dependencies No The paper mentions using 'Dream Sim' and a 'training recipe' but does not specify version numbers for key software components or libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes To choose α, we use the interpretability annotations of IMI [50]: We optimize α over a randomly chosen subset of just 5% of the annotated units to approximately match the value range of human interpretability scores, resulting in α = 0.16. ... As they used up to 20 tasks per unit, we average over N = 20. ... For this, we train a Res Net-50 on Image Net-2012, following the training recipe A3 of Wightman et al. [45], for 100 epochs.