Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Authors: Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Further, we demonstrate that applying SAE interventions on CLIP s vision encoder directly steers multimodal LLM outputs (e.g., LLa VA), without any modifications to the underlying language model.
Researcher Affiliation	Academia	1Technical University of Munich 2Helmholtz Munich 3Munich Center for Machine Learning 4Munich Data Science Institute 5University of Tübingen 6University of Copenhagen EMAIL
Pseudocode	No	The paper includes mathematical formulations and diagrams (e.g., Figure 2 for MS computation), but it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	Code and benchmark data are available at https://github.com/ExplainableML/sae-for-vlm.
Open Datasets	Yes	The SAEs are trained on activation vectors pre-extracted from the model s responses to Image Net [13] images. For CLIP, activation vectors are extracted from the classification (CLS) tokens in the residual stream after layers l {11, 17, 22, 23}, or from the output of the final projection layer.
Dataset Splits	Yes	Images I come from training set of the Image Net. activations.csv Provides activation values of all 50,000 Image Net validation images for each neuron.
Hardware Specification	Yes	Experiments are run on a single NVIDIA A100 GPU. All experiments have been conducted on a single NVIDIA A100 GPU with either 40 or 80 GB memory.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer' and 'GPT-4.1-mini' but does not provide specific version numbers for these or any other software libraries or tools used in the experiments.
Experiment Setup	Yes	We apply SAEs to explain fixed and pretrained CLIP Vi T-L/14-336px [47], Sig LIP So Vi T-400m/14384px [61], AIMv2 L/14-224px [19], and Web SSL MAE-300m/14-224px [18]. If not stated otherwise, we set the groups of Matryoshka SAEs as M = {0.0625ω, 0.1875ω, 0.4375ω, ω}, which roughly corresponds to doubling the size of the number of neurons added with each level down. For the Batch Top K activation, we fix the maximum number of non-zero latent neurons to K = 20. Both SAE types are compared across a wide range of expansion factors ε {1, 2, 4, 8, 16, 64}. All SAEs are optimized for 105 steps with minibatches of size 4096 using Adam optimizer [33], with the learning rate initialized at 16 125 ω following previous work [23].