AdaMerging: Adaptive Model Merging for Multi-Task Learning
Authors: Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental findings across eight tasks demonstrate the efficacy of the Ada Merging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, Ada Merging showcases a remarkable 11% improvement in performance. |
| Researcher Affiliation | Collaboration | 1Northeastern University, China 2University of Maryland, USA 3JD Explore Academy, China 4University of Oxford, UK 5Nanyang Technological University, Singapore |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode'). |
| Open Source Code | Yes | The code is available at Ada Merging. |
| Open Datasets | Yes | Following Ilharco et al. (2023) and Yadav et al. (2023), we study task vectors based multi-task model merging on eight image classification datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), DTD (Cimpoi et al., 2014). |
| Dataset Splits | Yes | Stanford Cars (Cars) (Krause et al., 2013) is a car classification dataset, which contains 196 classes of cars and a total of 16,185 images. Each class in the training set and test set is divided at a ratio of 1:1. |
| Hardware Specification | Yes | As shown in Tab. 12, we show the performance that Ada Merging can achieve under different training costs (based on a single Ge Force RTX 3090). |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Pytorch', but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use an Adam optimizer (Kingma & Ba, 2014) to update the merging coefficients, with the learning rate set to 0.001, the momentum to (0.9, 0.999), and the batch size to 16. To avoid significantly increasing training costs, we only trained 500 iterations to update the merging coefficient. |