Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Authors: Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost Van De Weijer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. [...] Table 1 presents our main results for multi-task model merging.
Researcher Affiliation	Collaboration	1Warsaw University of Technology, Poland 2IDEAS NCBR, Warsaw, Poland 3Department of Information Engineering, Media Integration and Communication Center (MICC), University of Florence, Italy 4NASK PIB, National Research Institute, Warsaw, Poland 5Gda nsk University of Technology, Poland 6IDEAS Research Center, Warsaw, Poland 7Computer Vision Center, Barcelona, Spain 8Department of Computer Science, Universitat Autonoma de Barcelona, Spain. Correspondence to: Daniel Marczak <EMAIL>.
Pseudocode	Yes	Algorithm 1 Iso-C: Isotropic Merging in Common Subspace [...] Algorithm 2 Iso-CTS: Isotropic Merging in Common and Task-Specific Subspaces
Open Source Code	Yes	1The code is available at https://github.com/danielm1405/iso-merging.
Open Datasets	Yes	The 8-dataset benchmark consists of: Cars (Krause et al., 2013), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), GTSRB (Stallkamp et al., 2011), MNIST (Lecun et al., 1998), RESISC45 (Cheng et al., 2017), SUN397 (Xiao et al., 2016), and SVHN (Netzer et al., 2011).
Dataset Splits	Yes	We evaluate our approaches over sets of 8, 14, and 20 datasets, following Wang et al. (2024b). We use the checkpoints fine-tuned on the tasks above, provided in Wang et al. (2024b) [...] We use codebase and checkpoints provided by Kn OTS: Vi T-B/32 and Vi T-L/14 fine-tuned with rank 16 Lo RA (Hu et al., 2021) on 8 vision tasks.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper mentions several models and frameworks (e.g., CLIP, Lo RA, T5-Large-LM-Adapt) but does not provide specific version numbers for these or other software dependencies used in the experiments.
Experiment Setup	Yes	Table 5. Optimal α value chosen on a held-out validation set for different model types and numbers of tasks for Iso-C and Iso-CTS. [...] The optimal fraction of common subspace k/r = 0.8, and we use this as a default value for Iso-CTS across all settings.