Model Fusion via Optimal Transport
Authors: Sidak Pal Singh, Martin Jaggi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our model fusion approach on standard image classification datasets, like CIFAR10 with commonly used convolutional neural networks (CNNs) such as VGG11 [23] and residual networks like Res Net18 [24]; and on MNIST, we use a fully connected network with 3 hidden layers of size 400, 200, 100, which we refer to as MLPNET. As baselines, we mention the performance of prediction ensembling and vanilla averaging, besides that of individual models. |
| Researcher Affiliation | Academia | Sidak Pal Singh ETH Zurich, Switzerland contact@sidakpal.com; Martin Jaggi EPFL, Switzerland martin.jaggi@epfl.ch |
| Pseudocode | Yes | Algorithm 1: Model Fusion (with ψ = { acts , wts } alignment) |
| Open Source Code | Yes | The code is available at the following link, https://github.com/sidak/otfusion. |
| Open Datasets | Yes | We test our model fusion approach on standard image classification datasets, like CIFAR10 with commonly used convolutional neural networks (CNNs) such as VGG11 [23] and residual networks like Res Net18 [24]; and on MNIST, we use a fully connected network with 3 hidden layers of size 400, 200, 100, which we refer to as MLPNET. |
| Dataset Splits | No | The paper mentions training and test sets and states that "All the performance scores are test accuracies," with full experimental details in an appendix not provided. It does not explicitly specify validation dataset splits or percentages. |
| Hardware Specification | Yes | To give a concrete estimate, the time taken to fuse six VGG11 models is 15 seconds on 1 Nvidia V100 GPU (c.f. Section S1.4 for more details). |
| Software Dependencies | No | The paper mentions using "exact OT solvers" and optimization methods like SGD, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) that would be needed for replication. |
| Experiment Setup | Yes | We typically consider a mini-batch of 100 to 400 samples for these experiments. Both are trained on their portions of the data for 10 epochs , and other training settings are identical. The finetuning scores for vanilla and OT averaging correspond to their best obtained results, when retrained with several finetuning learning rate schedules for a total of 100 and 120 epochs in case of VGG11and RESNET18 respectively. |