reproducibilityindex.ai

$C^2M^3$: Cycle-Consistent Multi-Model Merging

Authors: Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele Rodolà

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets.
Researcher Affiliation	Academia	Donato Crisostomi Sapienza University of Rome crisostomi@di.uniroma1.it Marco Fumero Institute of Science and Technology Austria fumero@di.uniroma1.it Daniele Baieri Sapienza University of Rome baieri@di.uniroma1.it Florian Bernard University of Bonn fb@uni-bonn.de Emanuele Rodol a Sapienza University of Rome rodola@di.uniroma1.it
Pseudocode	Yes	Algorithm 1 Frank-Wolfe for n-Model Matching
Open Source Code	Yes	Finally, to foster reproducible research in the field, we release a modular and reusable codebase containing implementations of our approach and the considered baselines.1 1https://github.com/crisostomi/cycle-consistent-model-merging
Open Datasets	Yes	We employ the most common datasets for image classification tasks: MNIST [9], CIFAR-10 [23], EMNIST [7] and CIFAR-100 [23], having 10, 10, 26 and 100 classes respectively. We use the standard train-test splits provided by torchvision for all datasets.
Dataset Splits	Yes	We use the standard train-test splits provided by torchvision for all datasets.
Hardware Specification	Yes	All of the experiments were carried out using consumer hardware, in particular mostly on a 32Gi B RAM machine with a 12th Gen Intel(R) Core(TM) i7-12700F processor and an Nvidia RTX 3090 GPU, except for some of the experiments that were carried on a 2080.
Software Dependencies	No	The paper mentions software like "Py Torch", "Py Torch Lightning", and "NN-Template" but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup	Yes	In particular, we train most of our models with a batch size of 100 for 250 epochs, using SGD with momentum 0.9, a learning rate of 0.1, and a weight decay of 10 4. We use a cosine annealing learning rate scheduler with a warm restart period of 10 epochs and a minimum learning rate of 0.