Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers

Authors: Johanna Vielhaben, Dilyara Bareeva, Jim Berend, Wojciech Samek, Nils Strodthoff

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate concept discovery in Sec. 4.1, check the superiority of our new concept deﬁnition over linear and simpliﬁed baselines for concept alignment analysis in Sec. 4.2, and perform a concept-alignment analysis between four Vi Ts in Sec. 4.3.
Researcher Affiliation	Academia	Johanna Vielhaben Fraunhofer HHI Dilyara Bareeva Fraunhofer HHI Jim Berend Fraunhofer HHI Wojciech Samek Fraunhofer HHI & Technical University of Berlin Nils Strodthoff Carl von Ossietzky Universität Oldenburg
Pseudocode	No	After concept discovery with HDBSCAN, we compute concept proximity scores P{φ} = {P (φ0), . . . ,P (φN)}, P [0, 1]n holds the concept membership scores P α(φ) of each concept Cα. These rely on the implementation of soft clustering with HDBSCAN from [30], which we formalize here for the reader s convenience. Clustering HDBSCAN ﬁrst transforms the feature space using a density-informed metric called mutual reachability distance... Soft clustering with HDBSCAN The soft cluster membership scores combine a distance-based membership with and an outlier score.
Open Source Code	Yes	We provide our code and use models, datasets and methods that are publicly available.
Open Datasets	Yes	For concept discovery and later analysis of representational alignment, we use a random subset of 25 % of the Image Net train set, stratiﬁed samples across all 1000 classes. [37] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211 252, 2015. doi: 10.1007/s11263-015-0816-y.
Dataset Splits	Yes	For concept discovery and later analysis of representational alignment, we use a random subset of 25 % of the Image Net train set, stratiﬁed samples across all 1000 classes.
Hardware Specification	Yes	All experiments were conducted on a Tesla V100 GPU.
Software Dependencies	No	We use the HDBSCAN implementation from [30]. We use the cu ML [36] versions of HDBSCAN and UMAP for computation on the GPU. While these refer to specific implementations and libraries, they do not provide explicit version numbers (e.g., 'hdbscan==0.8.27') for the software components as required.
Experiment Setup	Yes	Hyperparameters for UMAP and HDBSCAN... Minimal distance in UMAP: ... We use a value of 0.01 in all experiments. Number of neighbours in UMAP: ... We use a value of 30 in all experiments. Embedding dimensionality in UMAP: We use the practical limit for HDBSCAN of F = 50 in all experiments. Minimum cluster size in HDBSCAN: ... We use a value of 50 in all experiments. Min samples in HDBSCAN: ... We use a value of 20 in all experiments.