A Multimodal Automated Interpretability Agent
Authors: Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate applications of MAIA to computer vision models. We first characterize MAIA s ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features, and automatically identifying inputs likely to be mis-classified. |
| Researcher Affiliation | Academia | Tamar Rott Shaham 1 * Sarah Schwettmann 1 * Franklin Wang 1 Achyuta Rajaram 1 Evan Hernandez 1 Jacob Andreas 1 Antonio Torralba 1 1MIT CSAIL. Correspondence to: <tamarott@mit.edu, schwett@mit.edu>. |
| Pseudocode | No | The full MAIA API provided in the system prompt is reproduced below. import torch from typing import List, Tuple class System: ... Examples >>> # test the activation value of the neuron for the prompt "a dog standing on the grass" >>> def run_experiment(system, tools) -> Tuple[int, str]: >>> prompt = ["a dog standing on the grass"] >>> image = tools.text2image(prompt) >>> activation_list, activation_map_list = system.neuron(image) >>> return activation_list, activation_map_list The paper provides executable Python code examples for API usage rather than abstract pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Website: https://multimodal-interpretability.csail.mit.edu/maia (This website links directly to the project's GitHub repository containing the source code.) |
| Open Datasets | Yes | We give MAIA the ability to run such an experiment on the validation set of Image Net (Deng et al., 2009) and construct the set of 15 images that maximally activate the system it is interpreting. and Dataset exemplars for synthetic neurons are calculated by computing 15 top-activating images per neuron from the CC3M dataset (Sharma et al., 2018). and We use the Spawrious dataset as it provides a more complex classification task than simpler binary spurious classification benchmarks like Waterbirds (Wah et al., 2011; Sagawa et al., 2020) and Celeb A (Liu et al., 2015; Sagawa et al., 2020). |
| Dataset Splits | Yes | We use a 90-10 split to get a training set of size 22810 and a validation set of size 2534. and Next, for the balanced validation fine-tuning experiments, we sample ten balanced validation sets of size 320 and report the mean performances of each method. |
| Hardware Specification | No | MAIA is implemented with a GPT-4V vision-language model (VLM) backbone (Open AI, 2023b). and self.device = torch.device(f"cuda:{device}" if torch.cuda.is_available() else "cpu"). The paper mentions using a GPU or CPU and a specific VLM backbone, but does not provide details on specific hardware models (e.g., NVIDIA A100, Intel Core i7) used for running the experiments. |
| Software Dependencies | Yes | tools.text2image(prompts) tool that synthesizes images by calling Stable Diffusion v1.5 (Rombach et al., 2022a) on text prompts. and Gemini 1.0 Pro (Anil et al., 2023). |
| Experiment Setup | Yes | We train a Res Net-18 model (He et al., 2016) for one epoch on the O2O-Easy dataset from Spawrious using a learning rate of 1e-4, a weight decay of 1e-4, and a dropout of 0.1 on the final layer. |