ZipIt! Merging Models from Different Tasks without Training

Authors: George Stoica, Daniel Bolya, Jakob Brandt Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach by merging models trained on entirely disjoint sets of CIFAR (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) categories, as well as merging several models trained on completely independent datasets into one, significantly outperforming prior work (Sec. 5). Finally, we ablate and analyze our method s capabilities on these scenarios (Sec. 6).
Researcher Affiliation Academia George Stoica Daniel Bolya* Jakob Bjorner Pratik Ramesh Taylor Hearn Judy Hoffman Georgia Tech {gstoica3,dbolya,jbjorner3,pramesh39,thearn6,judy}@gatech.edu
Pseudocode No The paper describes procedural steps for the Zip operation, zip propagation, and matching algorithms (e.g., 'By default, we do this greedily: i.e., iteratively match the features with the highest correlation without replacement'), but these are not presented within a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code: https://github.com/gstoica27/Zip It.
Open Datasets Yes We validate our approach by merging models trained on entirely disjoint sets of CIFAR (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) categories, as well as merging several models trained on completely independent datasets into one, significantly outperforming prior work (Sec. 5).
Dataset Splits Yes For each experiment where we sample multiple disjoint splits of categories, we hold one split out for hyperparameter search and report mean and standard deviation on the rest. For experiments with models trained on different datasets, we subsample the validation set into a validation and test set to use for the same purpose.
Hardware Specification No The paper does not provide specific details regarding the hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper does not explicitly mention specific software dependencies with their version numbers required to reproduce the experiments.
Experiment Setup Yes For a fair comparison, we reset the batch norms for all methods (including the original models) using the training data (following the recommendation in Jordan et al., 2022). For our method, Zip It!n/m indicates that n out of the m layers in the network have been zipped (Sec. 4.3). Note, all our models have different initializations. ... We train 5 pairs of Res Net-20 (He et al., 2016) from scratch with different initializations on disjoint halves of the CIFAR-10 and CIFAR-100 classes (Krizhevsky et al., 2009). While Zip It! supports partial zipping to merge models with different outputs (in this case, disjoint label sets), prior methods without retraining do not. To make a fair comparison, we train these CIFAR models with a CLIP-style loss (Radford et al., 2021) using CLIP text encodings of the class names as targets.