reproducibilityindex.ai

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Authors: Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental findings across eight tasks demonstrate the efficacy of the Ada Merging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, Ada Merging showcases a remarkable 11% improvement in performance.
Researcher Affiliation	Collaboration	1Northeastern University, China 2University of Maryland, USA 3JD Explore Academy, China 4University of Oxford, UK 5Nanyang Technological University, Singapore
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code	Yes	The code is available at Ada Merging.
Open Datasets	Yes	Following Ilharco et al. (2023) and Yadav et al. (2023), we study task vectors based multi-task model merging on eight image classification datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), DTD (Cimpoi et al., 2014).
Dataset Splits	Yes	Stanford Cars (Cars) (Krause et al., 2013) is a car classification dataset, which contains 196 classes of cars and a total of 16,185 images. Each class in the training set and test set is divided at a ratio of 1:1.
Hardware Specification	Yes	As shown in Tab. 12, we show the performance that Ada Merging can achieve under different training costs (based on a single Ge Force RTX 3090).
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Pytorch', but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use an Adam optimizer (Kingma & Ba, 2014) to update the merging coefficients, with the learning rate set to 0.001, the momentum to (0.9, 0.999), and the batch size to 16. To avoid significantly increasing training costs, we only trained 500 iterations to update the merging coefficient.