Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Authors: ValΓ©rie Costa, Thomas Fel, Ekdeep S Lubana, Bahareh Tolooshams, Demba Ba
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comparing this architecture with existing SAEs on a mixture of synthetic and natural data settings, we show: (i) hierarchical concepts induce conditionally orthogonal features, which existing SAEs are unable to faithfully capture, and (ii) the nonlinear encoding step of MP-SAE recovers highly meaningful features, helping us unravel shared structure in the seemingly dichotomous representation spaces of different modalities in a vision-language model, hence demonstrating the assumption that useful features are solely linearly accessible is insufficient. |
| Researcher Affiliation | Collaboration | 1EPFL 2Kempner Institute, Harvard University 3CBS-NTT Program in Physics of Intelligence, Harvard University 4Physics of Artificial Intelligence Group, NTT Research, Inc., Sunnyvale, CA, USA 5University of Alberta 6Alberta Machine Intelligence Institute (Amii) 7SEAS, Harvard University |
| Pseudocode | Yes | Algorithm 1 Matching Pursuit Sparse Autoencoders (MP-SAE) |
| Open Source Code | Yes | The exact tree configuration used in our experiments is available on the project s Git Hub repository4. 4https://github.com/mpsae/MP-SAE |
| Open Datasets | Yes | Models were trained on IN1K [96] train set, using frozen representations from the final layer of each backbone. We train all SAEs on the full MS-COCO dataset [102], which consists of approximately 100,000 images and 500,000 associated captions. we perform preliminary experiments using a 4-layer Transformer model [145] pretrained on the Tiny Stories dataset [146]. |
| Dataset Splits | Yes | Training is conducted on the Image Net-1K training set, comprising around 1,3 Millions images. We train all SAEs on the full MS-COCO dataset [102], which consists of approximately 100,000 images and 500,000 associated captions. Each training example corresponds to a single embedding vector (either image or text), and models are trained jointly across both modalities using a shared dictionary (expansion factor 25). Batch-size is set to be 5000 tokens, each of which is randomly sampled from samples (stories) from the train-split of Tiny Stories. Training goes on for approximately 40K iterations. All results in the following experiments are performed on samples drawn from the eval-split of Tiny Stories. |
| Hardware Specification | No | We do not report detailed compute resource specifications (e.g., GPU type, memory, or runtime). |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | Training was performed for 50 epochs with Adam, using a learning rate of 5 10 4 and cosine decay to 10 6 with warmup. Models were trained on IN1K [96] train set, using frozen representations from the final layer of each backbone. For Vi T-style models (e.g., DINOv2), all spatial tokens and the CLS token were included ( 261 tokens per image for Dino V2, which results in approximately 25 billion training tokens for a training). We employ a batch size of 8,000 tokens per step and train all models using the Adam W optimizer with a cosine learning rate schedule: the learning rate warms up from 10 6 to 5 10 4 and decays back to 10 6 by the final epoch. A fixed weight decay of 10 5 is applied throughout. All SAEs utilize an expansion factor of 25, meaning the learned dictionary D Rc d satisfies c = 25d, where d is the dimensionality of the input activations. Each column Di is constrained to lie on the unit β2 ball: Di 2 1. The loss given is the standard MSE. To maintain active support coverage, a revive factor of 10 5 is added to any pre-code unit that fails to activate in a given batch, slightly increasing its pre-activation to reintroduce gradient flow. For Vanilla SAEs, we apply an adaptive β1 penalty: if the empirical β0 sparsity of a batch exceeds a target threshold, the β1 regularization weight is increased to suppress overactivation. All encoder architectures consist of a one-layer linear projection followed by a Re LU activation. |