Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
Authors: Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost Van De Weijer
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. [...] Table 1 presents our main results for multi-task model merging. |
| Researcher Affiliation | Collaboration | 1Warsaw University of Technology, Poland 2IDEAS NCBR, Warsaw, Poland 3Department of Information Engineering, Media Integration and Communication Center (MICC), University of Florence, Italy 4NASK PIB, National Research Institute, Warsaw, Poland 5Gda nsk University of Technology, Poland 6IDEAS Research Center, Warsaw, Poland 7Computer Vision Center, Barcelona, Spain 8Department of Computer Science, Universitat Autonoma de Barcelona, Spain. Correspondence to: Daniel Marczak <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Iso-C: Isotropic Merging in Common Subspace [...] Algorithm 2 Iso-CTS: Isotropic Merging in Common and Task-Specific Subspaces |
| Open Source Code | Yes | 1The code is available at https://github.com/danielm1405/iso-merging. |
| Open Datasets | Yes | The 8-dataset benchmark consists of: Cars (Krause et al., 2013), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), GTSRB (Stallkamp et al., 2011), MNIST (Lecun et al., 1998), RESISC45 (Cheng et al., 2017), SUN397 (Xiao et al., 2016), and SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | We evaluate our approaches over sets of 8, 14, and 20 datasets, following Wang et al. (2024b). We use the checkpoints fine-tuned on the tasks above, provided in Wang et al. (2024b) [...] We use codebase and checkpoints provided by Kn OTS: Vi T-B/32 and Vi T-L/14 fine-tuned with rank 16 Lo RA (Hu et al., 2021) on 8 vision tasks. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions several models and frameworks (e.g., CLIP, Lo RA, T5-Large-LM-Adapt) but does not provide specific version numbers for these or other software dependencies used in the experiments. |
| Experiment Setup | Yes | Table 5. Optimal α value chosen on a held-out validation set for different model types and numbers of tasks for Iso-C and Iso-CTS. [...] The optimal fraction of common subspace k/r = 0.8, and we use this as a default value for Iso-CTS across all settings. |