Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interpreting vision transformers via residual replacement model

Authors: Jinyeong Kim, Junhyeok Kim, Yumin Shim, Joohyeok Kim, Sunyoung Jung, Seong Jae Hwang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a large-scale feature analysis across all layers and multiple model variants, based on large-scale human annotations of a total of 6.6K features (Figure 1b). Remarkably, we find that early-layer features, though seemingly uninterpretable, are in fact consistently interpretable at the patch level. [...] Table 1: Circuit Evaluation. We evaluate the faithfulness, completeness, and causality of the circuits using 1,500 randomly sampled images from the Image Net validation set.
Researcher Affiliation Academia Jinyeong Kim Junhyeok Kim Yumin Shim Joohyeok Kim Sunyoung Jung Seong Jae Hwang Yonsei University EMAIL
Pseudocode No The paper describes methods and procedures using natural language and mathematical equations (e.g., Equations 1-8) but does not include any clearly labeled pseudocode or algorithm blocks. While Section 3.1 outlines steps for 'Discovering Circuits', it is presented as a list of descriptive sentences rather than a formal algorithm structure.
Open Source Code Yes https://github.com/rubato-yeong/RRM
Open Datasets Yes Training and evaluation are performed on the Image Net-1K [106] training/validation dataset, respectively. For each model, we collect all tokens ([CLS], patch tokens, and register tokens (if present)) from each layer to train the Top K SAE.
Dataset Splits Yes Training and evaluation are performed on the Image Net-1K [106] training/validation dataset, respectively. [...] We evaluate the faithfulness, completeness, and causality of the circuits using 1,500 randomly sampled images from the Image Net validation set.
Hardware Specification Yes Each SAE was trained using an RTX 3090 GPU for approximately 12–14 hours per model. [...] As a result, we can compute the full edge importance in a few seconds for a single image using a single RTX A6000 GPU.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers, such as Python, PyTorch, or CUDA versions. It describes implementation details for training models and SAEs, but without listing the required software stack.
Experiment Setup Yes We trained the Top K SAE using the Adam optimizer with a learning rate of 2e-4, β1 = 0.9, and β2 = 0.999. The model was trained for 50 epochs with a batch size of 512. Hyperparameters are set as follows: α = 1/32, kaux = 256. Decoder normalization is applied at each training step, and the input x is normalized to have zero mean and unit variance.