Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3, we experimentally validate our findings across image classification, semantic segmentation, and neural machine translation architectures and datasets.
Researcher Affiliation Academia Max Zimmer1, Christoph Spiegel1 & Sebastian Pokutta1,2 1Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Germany 2Institute of Mathematics, Technische Universit at Berlin, Germany {zimmer,spiegel,pokutta}@zib.de
Pseudocode Yes Figure 2: Left: Sketch of the algorithm for a single phase and m = 3. Right: Pseudocode for SMS.
Open Source Code Yes For reproducibility, our implementation is available at github.com/ZIB-IOL/SMS.
Open Datasets Yes We evaluate our approach on well-known datasets for image recognition, semantic segmentation, and neural machine translation (NMT), including Image Net-1K (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), Celeb-A (Liu et al., 2015), City Scapes (Cordts et al., 2016), WMT16 DE-EN (Bojar et al., 2016)
Dataset Splits Yes For validation, we use 10% of the training data.
Hardware Specification No The paper mentions 'FLOPs are computed using a single test batch' but does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory details).
Software Dependencies No The paper mentions using SGD as an optimizer and adapting code from the Shrink Bench framework, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Table 3 shows the exact pretraining settings for each datasetarchitecture pair, reporting the number of epochs used for pretraining, the batch size, weight decay as well as the learning rate used. The exact retraining hyperparameters are specified explicitly in the descriptions of each experiment or in the corresponding subsection in Appendix B.