Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalized Linear Mode Connectivity for Transformers
Authors: Alexander Theus, Alessandro Cabodi, Sotiris Anagnostidis, Antonio Orvieto, Sidak Pal Singh, Valentina Boeva
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate for the first time lowand zero-loss linear connections between independently trained Vision Transformers and GPT-2 models, even in multi-model settings. |
| Researcher Affiliation | Academia | 1ETH Zürich, 2MPI for Learning Systems, 3ELLIS Institute Tübingen, 4MPI for Intelligent Systems, 5Tübingen AI Center, 6Swiss Institute of Bioinformatics, 7Université Paris Cité, Institut Cochin, INSERM U1016 |
| Pseudocode | Yes | Algorithm 1 Learning matching via task loss |
| Open Source Code | Yes | Our code is available here. |
| Open Datasets | Yes | We evaluate the proposed model alignment methods on two Transformer architectures: Vision Transformers (Vi Ts) and GPT-2, spanning vision and language tasks. To measure LMC, we compute the loss barrier between two models θA and θB as defined in Equation 1 on the test split. ... For Vision Transformers (Vi Ts) and GPT-2 across two datasets each. Component Vi T GPT-2 CIFAR-10/100 Tiny Image Net Tiny Shakespeare Book Corpus |
| Dataset Splits | Yes | To measure LMC, we compute the loss barrier between two models θA and θB as defined in Equation 1 on the test split. ... On the Tiny Image Net test dataset, the models achieve an accuracy of 44.19 ± 0.17 and a calibrated loss of 2.54 ± 0.02. For the CIFAR-10 test dataset, they obtain an accuracy of 83.81 ± 0.44 and a loss of 0.57 ± 0.01. |
| Hardware Specification | Yes | Table 3: Configuration and training details for Vision Transformer (Vi T) and GPT-2 across two datasets each. ... Hardware 1 RTX 2060 1x RTX 4090 1 RTX 4090 4 RTX 4090 |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and tools like 'GPT-2 tokenizer' and 'mixed-precision (fp16) training', but does not provide specific version numbers for any software libraries or frameworks like Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | Table 3: Configuration and training details for Vision Transformer (Vi T) and GPT-2 across two datasets each. Component Vi T GPT-2 CIFAR-10/100 Tiny Image Net Tiny Shakespeare Book Corpus Transformer layers 6 8 6 6 Attention heads 8 8 4 8 Embedding dimension 256 384 256 512 MLP hidden dimension 512 768 1024 2048 Patch size 4 4 8 8 Sequence length 256 512 Training epochs 150 150 100 5 Batch size 128 128 32 64 Optimizer Adam W Adam W Adam W Adam W Learning rate 3 10 4 3 10 4 3 10 4 2.5 10 4 Weight decay 1 10 3 0.05 0.01 0.01 Learning rate schedule Cosine annealing Cosine (warmup) Cosine (warmup) Cosine (warmup) |