reproducibilityindex.ai

Model Fusion via Optimal Transport

Authors: Sidak Pal Singh, Martin Jaggi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our model fusion approach on standard image classiﬁcation datasets, like CIFAR10 with commonly used convolutional neural networks (CNNs) such as VGG11 [23] and residual networks like Res Net18 [24]; and on MNIST, we use a fully connected network with 3 hidden layers of size 400, 200, 100, which we refer to as MLPNET. As baselines, we mention the performance of prediction ensembling and vanilla averaging, besides that of individual models.
Researcher Affiliation	Academia	Sidak Pal Singh ETH Zurich, Switzerland contact@sidakpal.com; Martin Jaggi EPFL, Switzerland martin.jaggi@epfl.ch
Pseudocode	Yes	Algorithm 1: Model Fusion (with ψ = { acts , wts } alignment)
Open Source Code	Yes	The code is available at the following link, https://github.com/sidak/otfusion.
Open Datasets	Yes	We test our model fusion approach on standard image classiﬁcation datasets, like CIFAR10 with commonly used convolutional neural networks (CNNs) such as VGG11 [23] and residual networks like Res Net18 [24]; and on MNIST, we use a fully connected network with 3 hidden layers of size 400, 200, 100, which we refer to as MLPNET.
Dataset Splits	No	The paper mentions training and test sets and states that "All the performance scores are test accuracies," with full experimental details in an appendix not provided. It does not explicitly specify validation dataset splits or percentages.
Hardware Specification	Yes	To give a concrete estimate, the time taken to fuse six VGG11 models is 15 seconds on 1 Nvidia V100 GPU (c.f. Section S1.4 for more details).
Software Dependencies	No	The paper mentions using "exact OT solvers" and optimization methods like SGD, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) that would be needed for replication.
Experiment Setup	Yes	We typically consider a mini-batch of 100 to 400 samples for these experiments. Both are trained on their portions of the data for 10 epochs , and other training settings are identical. The ﬁnetuning scores for vanilla and OT averaging correspond to their best obtained results, when retrained with several ﬁnetuning learning rate schedules for a total of 100 and 120 epochs in case of VGG11and RESNET18 respectively.