Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalized Linear Mode Connectivity for Transformers

Authors: Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, Valentina Boeva

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate for the first time lowand zero-loss linear connections between independently trained Vision Transformers and GPT-2 models, even in multi-model settings.
Researcher Affiliation Academia 1ETH Zürich, 2MPI for Learning Systems, 3ELLIS Institute Tübingen, 4MPI for Intelligent Systems, 5Tübingen AI Center, 6Swiss Institute of Bioinformatics, 7Université Paris Cité, Institut Cochin, INSERM U1016
Pseudocode Yes Algorithm 1 Learning matching via task loss
Open Source Code Yes Our code is available here.
Open Datasets Yes We evaluate the proposed model alignment methods on two Transformer architectures: Vision Transformers (Vi Ts) and GPT-2, spanning vision and language tasks. To measure LMC, we compute the loss barrier between two models θA and θB as defined in Equation 1 on the test split. ... For Vision Transformers (Vi Ts) and GPT-2 across two datasets each. Component Vi T GPT-2 CIFAR-10/100 Tiny Image Net Tiny Shakespeare Book Corpus
Dataset Splits Yes To measure LMC, we compute the loss barrier between two models θA and θB as defined in Equation 1 on the test split. ... On the Tiny Image Net test dataset, the models achieve an accuracy of 44.19 ± 0.17 and a calibrated loss of 2.54 ± 0.02. For the CIFAR-10 test dataset, they obtain an accuracy of 83.81 ± 0.44 and a loss of 0.57 ± 0.01.
Hardware Specification Yes Table 3: Configuration and training details for Vision Transformer (Vi T) and GPT-2 across two datasets each. ... Hardware 1 RTX 2060 1x RTX 4090 1 RTX 4090 4 RTX 4090
Software Dependencies No The paper mentions optimizers (Adam W) and tools like 'GPT-2 tokenizer' and 'mixed-precision (fp16) training', but does not provide specific version numbers for any software libraries or frameworks like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes Table 3: Configuration and training details for Vision Transformer (Vi T) and GPT-2 across two datasets each. Component Vi T GPT-2 CIFAR-10/100 Tiny Image Net Tiny Shakespeare Book Corpus Transformer layers 6 8 6 6 Attention heads 8 8 4 8 Embedding dimension 256 384 256 512 MLP hidden dimension 512 768 1024 2048 Patch size 4 4 8 8 Sequence length 256 512 Training epochs 150 150 100 5 Batch size 128 128 32 64 Optimizer Adam W Adam W Adam W Adam W Learning rate 3 10 4 3 10 4 3 10 4 2.5 10 4 Weight decay 1 10 3 0.05 0.01 0.01 Learning rate schedule Cosine annealing Cosine (warmup) Cosine (warmup) Cosine (warmup)