AdaMerging: Adaptive Model Merging for Multi-Task Learning

Authors: Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental findings across eight tasks demonstrate the efficacy of the Ada Merging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, Ada Merging showcases a remarkable 11% improvement in performance.
Researcher Affiliation Collaboration 1Northeastern University, China 2University of Maryland, USA 3JD Explore Academy, China 4University of Oxford, UK 5Nanyang Technological University, Singapore
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code Yes The code is available at Ada Merging.
Open Datasets Yes Following Ilharco et al. (2023) and Yadav et al. (2023), we study task vectors based multi-task model merging on eight image classification datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), DTD (Cimpoi et al., 2014).
Dataset Splits Yes Stanford Cars (Cars) (Krause et al., 2013) is a car classification dataset, which contains 196 classes of cars and a total of 16,185 images. Each class in the training set and test set is divided at a ratio of 1:1.
Hardware Specification Yes As shown in Tab. 12, we show the performance that Ada Merging can achieve under different training costs (based on a single Ge Force RTX 3090).
Software Dependencies No The paper mentions 'Adam optimizer' and 'Pytorch', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use an Adam optimizer (Kingma & Ba, 2014) to update the merging coefficients, with the learning rate set to 0.001, the momentum to (0.9, 0.999), and the batch size to 16. To avoid significantly increasing training costs, we only trained 500 iterations to update the merging coefficient.