Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective
Authors: Beatrix Nielsen, Emanuele Marconato, Andrea Dittadi, Luigi Gresele
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct classification experiments on CIFAR-10 and find substantial representational dissimilarity between some trained models with similarly good performance (Section 5.1). We also run synthetic experiments showing that wider neural networks tend to learn distributions closer under our distance and have more similar representations (Section 5.2). |
| Researcher Affiliation | Academia | 1Technical University of Denmark 2University of Trento, Italy 3Helmholtz AI, Munich 4Technical University of Munich 5Munich Center for Machine Learning (MCML) 6University of Copenhagen |
| Pseudocode | Yes | In Algorithm 1, we report the algorithm in the case where we have n samples of our M-dimensional random variables. In this case, we work with (n M) matrices Z, W, where each row is a sample. The aim of the algorithm is to get projection vectors u(r), v(r) such that Cov ZW[Zu(r), Wv(r)] is as large as possible under the constraint that u = v = 1. The original algorithm includes a choice of signs for the covariances. In this version of the algorithm, we have chosen all signs to be positive. R > 0 denotes the maximal rank of the algorithm. Algorithm 1 Iterative SVD Projection Extraction from Cross-Covariance Matrix 1: Set r 1 2: Center and scale Z and W 3: Compute cross-covariance matrix: C Z W 4: Set C(1) C 5: while C(r) = 0 and r R do 6: Perform SVD: C(r) = UDV 7: Extract leading singular vectors: u(r) 1 first column of U, v(r) 1 first column of V 8: Save u(r) 1 and v(r) 1 as the r-th projection vectors 9: Let σr leading singular value of C(r) 10: Update matrix: C(r+1) C(r) σru(r) 1 v(r) 1 11: r r + 1 |
| Open Source Code | Yes | The code can be found on Git Hub.12 github.com/bemigini/close-dist-rep-sim. |
| Open Datasets | Yes | We conduct classification experiments on CIFAR-10 [32] with two-dimensional embedding and unembedding representations. |
| Dataset Splits | Yes | CIFAR-10. We used the CIFAR-10 dataset as loaded with torchvision package19. The dataset from[32] and contains 50, 000 images for train and validation, and 10, 000 images for testing. |
| Hardware Specification | Yes | Each model was trained using a single NVIDIA RTX A5000. |
| Software Dependencies | Yes | Python Packages. All experiments are conducted with Python 3.11 and used pytorch 2.5.1. |
| Experiment Setup | Yes | All models are trained with a batch size of 128 for 15, 000 steps using the ADAM optimizer [29]. |