Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Authors: Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Further, we demonstrate that applying SAE interventions on CLIP s vision encoder directly steers multimodal LLM outputs (e.g., LLa VA), without any modifications to the underlying language model. |
| Researcher Affiliation | Academia | 1Technical University of Munich 2Helmholtz Munich 3Munich Center for Machine Learning 4Munich Data Science Institute 5University of Tübingen 6University of Copenhagen EMAIL |
| Pseudocode | No | The paper includes mathematical formulations and diagrams (e.g., Figure 2 for MS computation), but it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | Code and benchmark data are available at https://github.com/ExplainableML/sae-for-vlm. |
| Open Datasets | Yes | The SAEs are trained on activation vectors pre-extracted from the model s responses to Image Net [13] images. For CLIP, activation vectors are extracted from the classification (CLS) tokens in the residual stream after layers l {11, 17, 22, 23}, or from the output of the final projection layer. |
| Dataset Splits | Yes | Images I come from training set of the Image Net. activations.csv Provides activation values of all 50,000 Image Net validation images for each neuron. |
| Hardware Specification | Yes | Experiments are run on a single NVIDIA A100 GPU. All experiments have been conducted on a single NVIDIA A100 GPU with either 40 or 80 GB memory. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer' and 'GPT-4.1-mini' but does not provide specific version numbers for these or any other software libraries or tools used in the experiments. |
| Experiment Setup | Yes | We apply SAEs to explain fixed and pretrained CLIP Vi T-L/14-336px [47], Sig LIP So Vi T-400m/14384px [61], AIMv2 L/14-224px [19], and Web SSL MAE-300m/14-224px [18]. If not stated otherwise, we set the groups of Matryoshka SAEs as M = {0.0625ω, 0.1875ω, 0.4375ω, ω}, which roughly corresponds to doubling the size of the number of neurons added with each level down. For the Batch Top K activation, we fix the maximum number of non-zero latent neurons to K = 20. Both SAE types are compared across a wide range of expansion factors ε {1, 2, 4, 8, 16, 64}. All SAEs are optimized for 105 steps with minibatches of size 4096 using Adam optimizer [33], with the learning rate initialized at 16 125 ω following previous work [23]. |