Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models

Authors: Viacheslav Surkov, Chris Wendler, Antonio Mari, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre, David Bau

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the possibility of using SAEs to learn interpretable features for SDXL Turbo, a few-step text-to-image diffusion model. To this end, we train SAEs on the updates performed by transformer blocks within SDXL Turbo s denoising U-net in its 1-step setting. Interestingly, we find that they generalize to 4-step SDXL Turbo and even to the multi-step SDXL base model (i.e., a different model) without additional training. In addition, we show that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks. We do so by creating RIEBench, a representation-based image editing benchmark, for editing images while they are generated by turning on and off individual SAE features.
Researcher Affiliation Academia Viacheslav Surkov ,1 Chris Wendler ,2 Antonio Mari1 Mikhail Terekhov1 Justin Deschenaux1 Robert West1 Caglar Gulcehre1 David Bau2 1EPFL 2Northeastern University
Pseudocode No The paper describes methods and equations, but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' block with structured, numbered steps in a code-like format.
Open Source Code Yes We open-source the code of SDLens.2 2Our project page links to all mentioned repositories and resources https://sdxl-unbox.epfl.ch.
Open Datasets Yes on 1.5M LAION-COCO prompts [53, 54]. We then use these feature maps to train multiple SAEs for each transformer block. We sample our source and target prompts from PIEBench [26], a prompt-based image editing benchmark for diffusion inversion methods.
Dataset Splits No Each feature map has dimensions of 16 x 16, resulting in a training dataset of 384M dense feature vectors per transformer block. For the SAE training process, we followed the methodology described in [19]... We trained the SAEs on the 1-step generation process of SDXL Turbo... we compute their explained variance across the 100 randomly generated images for SDXL Turbo s 4-step setting and the vanilla SDXL base model s multi-step setting (see Fig. 3 left).
Hardware Specification No The Acknowledgements section mentions "EPFL RCP and IC clusters maintainers" but does not specify the types of compute workers (e.g., GPU/CPU models), memory, or specific configurations of these clusters used for the experiments.
Software Dependencies No The paper references various tools and models like "Grounded SAM2 [49]", "CLIP [44]", "GPT-4o [38]", and "Sentence-BERT... [48]", but it does not specify programming languages or library names with their exact version numbers that were used to implement the methodology.
Experiment Setup Yes Following [19], we set ϖ = 1/32 and k_aux = 256, performed tied initialization of encoder and decoder, normalized decoder rows after each training step. The number of learned features nf is set to 5120, which is four times the length of the input vector. The value of k is set to 10 as a good trade-off between sparsity and reconstruction quality. Other training hyperparameters are batch size: 4096, optimizer: Adam with learning rate: 10^-4 and betas: (0.9, 0.999).