Measuring Per-Unit Interpretability at Scale Without Humans
Authors: Roland S. Zimmermann, David Klindt, Wieland Brendel
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate its predictive power through an interventional human psychophysics study. We demonstrate the usefulness of this measure by performing previously infeasible experiments: (1) A large-scale interpretability analysis across more than 70 million units from 835 computer vision models, and (2) an extensive analysis of how units transform during training. |
| Researcher Affiliation | Academia | Roland S. Zimmermann MPI-IS, Tübingen AI Center David Klindt Stanford Wieland Brendel MPI-IS, Tübingen AI Center |
| Pseudocode | No | The paper provides mathematical equations for the Machine Interpretability Score (MIS) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Online version, code and interactive visualizations available at brendel-group.github.io/mis. |
| Open Datasets | Yes | Both query images and explanations are chosen from the training set of Image Net-2012 [40]. |
| Dataset Splits | No | The paper mentions using the 'training set of Image Net-2012' for query images and explanations, and refers to a 'training recipe' for a ResNet-50, but it does not explicitly state the specific percentages or counts for training, validation, and test splits for its experiments. |
| Hardware Specification | Yes | Evaluating all units of a model takes, on average and varying depending on the model’s size, less than one hour on a GPU (e.g., NVIDIA RTX 2080-TI or V100). |
| Software Dependencies | No | The paper mentions using 'Dream Sim' and a 'training recipe' but does not specify version numbers for key software components or libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | To choose α, we use the interpretability annotations of IMI [50]: We optimize α over a randomly chosen subset of just 5% of the annotated units to approximately match the value range of human interpretability scores, resulting in α = 0.16. ... As they used up to 20 tasks per unit, we average over N = 20. ... For this, we train a Res Net-50 on Image Net-2012, following the training recipe A3 of Wightman et al. [45], for 100 epochs. |