Localizing Task Information for Improved Model Merging and Compression

Authors: Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez, François Fleuret, Pascal Frossard

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.
Researcher Affiliation Collaboration 1 Ecole Polytechnique F ed erale de Lausanne, Lausanne, Switzerland 2Google Deepmind 3Work done while at EPFL. 4University of Geneva, Geneva, Switzerland.
Pseudocode No The paper describes algorithms using equations and text but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The source code can be found at https://github.com/nik-dim/tall_masks.
Open Datasets Yes For the 8-task vision benchmark proposed by Ilharco et al. (2023), we randomly select a subset of weights for each task and perform gradient updates only for those parameters...Specifically, for the 8-task vision benchmark proposed by Ilharco et al. (2023)...
Dataset Splits Yes The results for this control experiment are presented in Table 1, compared with task arithmetic where the models are fine-tuned in a standard way. Looking at the normalized accuracy, defined in Appendix A, we observe that the performance of task arithmetic in the controlled setting deteriorates at the same rate as standard fine-tuning, where the accuracy of the merged model is 2.7% worse than standard case...We validate the efficacy of our mask construction by checking if the original performance in the same 8-task computer vision benchmark, evaluated on a held-out dataset, can be restored...Note that λt is selected based on the validation accuracy of each task respectively, allowing for the task-specific problems to be solved in parallel and independently.
Hardware Specification Yes All our experiments were performed using the same hardware consisting of four V100 NVIDIA GPUs with 32GB of memory each.
Software Dependencies No The paper mentions software like "Adam W optimizer" and "CLIP model variants" but does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes Specifically, we fine-tune the same pre-trained CLIP checkpoint obtained from the openclip repository (Ilharco et al., 2021). We fine-tune for 2,000 iterations, using a batch size of 128, a learning rate of 1e 5, and a cosine annealing learning rate schedule with 200 warm-up steps, along with the Adam W optimizer...For constructing task-specific masks, we tune the hyper-parameter λ for each task over {0.2, 0.3, 0.4, 0.5, 0.6}...The scaling factor is tuned over a range of {0.0, 0.1, ..., 0.9, 1.0}, selected based on the performance on the validation set averaged on all tasks.