Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Attention (as Discrete-Time Markov) Chains

Authors: Yotam Erel, Olaf Dünkel, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Amit Bermano

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section provides extensive evidence that our framework and the extended operations over the attention matrices are useful for a variety of downstream tasks. We start with improving zero-shot segmentation over a standard benchmark using multi-bounce attention (Section 5.1), followed by showing that Token Rank extracts a more informative signal than existing baselines (Section 5.2), useful for visualization purposes. Then, we demonstrate that applying Token Rank to existing techniques improves unconditional generation of images (Section 5.3) and segmentation (Section 5.4). This is supplemented with a token masking experiment (Section 5.5), where we deliberately mask tokens according to their Token Rank importance and measure downstream accuracy for image classification.
Researcher Affiliation	Academia	1Tel Aviv University 2MPI for Informatics, SIC
Pseudocode	No	The paper describes methods and mathematical formulations in paragraph text and equations (e.g., in Sections 3.1 and 4), but it does not contain any distinct, structured pseudocode or algorithm blocks labeled as such or formatted like code.
Open Source Code	Yes	We open-source the code at https://github.com/yoterel/attention_ chains_code to reproduce the results.
Open Datasets	Yes	We evaluate on the Image Net Segmentation benchmark (Chefer et al., 2021b)... where the ground truth dataset is the LAION-Aesthetics V2 dataset (Schuhmann et al., 2022) originally used to train SD1.5... We incorporate Token Rank into Diff Seg (Tian et al., 2024)... using the COCO-Stuff-27 benchmark... for the Image Net (Deng et al., 2009) subset Imagenette (Howard, 2019)
Dataset Splits	Yes	Finally, we train a linear classifier with the commonly used train and test split and with the attention visualizations of all layers and heads as input... We evaluate the accuracy degradation of several vision transformer-based classifiers over 5000 randomly selected images from Image Net with a fixed seed.
Hardware Specification	Yes	For all computations, we used an internal GPU cluster consisting of NVIDIA A40, A100, H100, and RTX 8000 GPUs.
Software Dependencies	No	The paper mentions models like SD1.5 and Flux-Schnell Di T, and an optimizer like SGD, but it does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	For the SAG experiments, we build on top of the code of (Hong et al., 2023)... We use SD1.5 (Rombach et al., 2022) as the foundational model and use the second last decoder block s last self-attention layer for masking. ... We use the unconditional branch only, and allow up to 40% of the attention matrix to be masked, following the original implementation. ...optimized with the SGD optimizer with a set of learning rates ({1e 5, 2e 5, 5e 5, 1e 4, 2e 4, 5e 4, 1e 3})) for 20 epochs...