$C^2M^3$: Cycle-Consistent Multi-Model Merging
Authors: Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele RodolĂ
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. |
| Researcher Affiliation | Academia | Donato Crisostomi Sapienza University of Rome crisostomi@di.uniroma1.it Marco Fumero Institute of Science and Technology Austria fumero@di.uniroma1.it Daniele Baieri Sapienza University of Rome baieri@di.uniroma1.it Florian Bernard University of Bonn fb@uni-bonn.de Emanuele Rodol a Sapienza University of Rome rodola@di.uniroma1.it |
| Pseudocode | Yes | Algorithm 1 Frank-Wolfe for n-Model Matching |
| Open Source Code | Yes | Finally, to foster reproducible research in the field, we release a modular and reusable codebase containing implementations of our approach and the considered baselines.1 1https://github.com/crisostomi/cycle-consistent-model-merging |
| Open Datasets | Yes | We employ the most common datasets for image classification tasks: MNIST [9], CIFAR-10 [23], EMNIST [7] and CIFAR-100 [23], having 10, 10, 26 and 100 classes respectively. We use the standard train-test splits provided by torchvision for all datasets. |
| Dataset Splits | Yes | We use the standard train-test splits provided by torchvision for all datasets. |
| Hardware Specification | Yes | All of the experiments were carried out using consumer hardware, in particular mostly on a 32Gi B RAM machine with a 12th Gen Intel(R) Core(TM) i7-12700F processor and an Nvidia RTX 3090 GPU, except for some of the experiments that were carried on a 2080. |
| Software Dependencies | No | The paper mentions software like "Py Torch", "Py Torch Lightning", and "NN-Template" but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | In particular, we train most of our models with a batch size of 100 for 250 epochs, using SGD with momentum 0.9, a learning rate of 0.1, and a weight decay of 10 4. We use a cosine annealing learning rate scheduler with a warm restart period of 10 epochs and a minimum learning rate of 0. |