Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

Authors: Beatrix Nielsen, Emanuele Marconato, Andrea Dittadi, Luigi Gresele

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct classification experiments on CIFAR-10 and find substantial representational dissimilarity between some trained models with similarly good performance (Section 5.1). We also run synthetic experiments showing that wider neural networks tend to learn distributions closer under our distance and have more similar representations (Section 5.2).
Researcher Affiliation	Academia	1Technical University of Denmark 2University of Trento, Italy 3Helmholtz AI, Munich 4Technical University of Munich 5Munich Center for Machine Learning (MCML) 6University of Copenhagen
Pseudocode	Yes	In Algorithm 1, we report the algorithm in the case where we have n samples of our M-dimensional random variables. In this case, we work with (n M) matrices Z, W, where each row is a sample. The aim of the algorithm is to get projection vectors u(r), v(r) such that Cov ZW[Zu(r), Wv(r)] is as large as possible under the constraint that u = v = 1. The original algorithm includes a choice of signs for the covariances. In this version of the algorithm, we have chosen all signs to be positive. R > 0 denotes the maximal rank of the algorithm. Algorithm 1 Iterative SVD Projection Extraction from Cross-Covariance Matrix 1: Set r 1 2: Center and scale Z and W 3: Compute cross-covariance matrix: C Z W 4: Set C(1) C 5: while C(r) = 0 and r R do 6: Perform SVD: C(r) = UDV 7: Extract leading singular vectors: u(r) 1 first column of U, v(r) 1 first column of V 8: Save u(r) 1 and v(r) 1 as the r-th projection vectors 9: Let σr leading singular value of C(r) 10: Update matrix: C(r+1) C(r) σru(r) 1 v(r) 1 11: r r + 1
Open Source Code	Yes	The code can be found on Git Hub.12 github.com/bemigini/close-dist-rep-sim.
Open Datasets	Yes	We conduct classification experiments on CIFAR-10 [32] with two-dimensional embedding and unembedding representations.
Dataset Splits	Yes	CIFAR-10. We used the CIFAR-10 dataset as loaded with torchvision package19. The dataset from[32] and contains 50, 000 images for train and validation, and 10, 000 images for testing.
Hardware Specification	Yes	Each model was trained using a single NVIDIA RTX A5000.
Software Dependencies	Yes	Python Packages. All experiments are conducted with Python 3.11 and used pytorch 2.5.1.
Experiment Setup	Yes	All models are trained with a batch size of 128 for 15, 000 steps using the ADAM optimizer [29].