Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Connecting Neural Models Latent Geometries with Relative Geodesic Representations

Authors: Hanlin Yu, Berfin Inal, Georgios Arvanitidis, Søren Hauberg, Francesco Locatello, Marco Fumero

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate experimentally our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models, across diverse architectures, datasets, pretraining schemes and modalities.
Researcher Affiliation	Academia	Hanlin Yu1 University of Helsinki Berfin Inal University of Amsterdam Georgios Arvanitidis DTU Søren Hauberg DTU Francesco Locatello IST Austria Marco Fumero1 IST Austria
Pseudocode	Yes	Algorithm 1 Relative Geodesic Representations
Open Source Code	Yes	Code is available at https://github.com/marc0git/Relative Geodesics.
Open Datasets	Yes	For the following experiment, we trained pairs of convolutional autoencoders (F1, F2) with different initializations on MNIST [Deng, 2012], Fashion MNIST [Xiao et al., 2017], CIFAR10 [Krizhevsky, 2009] datasets. [...] We perform experiments on retrieval tasks on pretrained vision foundation models, investigating how well we can match representations together with different backbones subject to the decoding tasks, on 5 datasets, varying in complexity and size: CIFAR10, CIFAR100 [Krizhevsky, 2009], SVHN [Yuval Netzer et al., 2011], CUB [Wah et al., 2023], and Image Net-1k [Russakovsky et al., 2015].
Dataset Splits	Yes	Unless otherwise stated, we directly use the original test set of the dataset as the test set, while using 0.9 of the original train set as the train set and the remaining as the validation set. Both the anchors and the Diet data points are selected from the validation set.
Hardware Specification	Yes	Experiments regarding the geodesic approximation are conducted using NVIDIA A100 GPU and 12 CPU cores. [...] The autoencoder stitching and retrieval experiments were conducted on a single NVIDIA RTX 3080TI GPU. Experiments involving vision foundation models were run on a compute cluster, each job using a single NVIDIA A100 GPU and 10 CPU cores, with runtimes of several hours.
Software Dependencies	No	True geodesics are computed using Stochman library [Detlefsen et al., 2021], which has Apache-2.0 license, which wraps the decoder into a pullback manifold, intializes a parameterized spline path between codes, and then optimizes its parameters to minimize the Riemannian energy.
Experiment Setup	Yes	Each model was trained for 50 epochs, reaching convergence, using the Adam optimizer [Kingma and Ba, 2017] with a batch size of 64. We set the learning rate to 0.001, and fixed a random seed of 42 to ensure reproducibility.