Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 3, we experimentally validate our findings across image classification, semantic segmentation, and neural machine translation architectures and datasets. |
| Researcher Affiliation | Academia | Max Zimmer1, Christoph Spiegel1 & Sebastian Pokutta1,2 1Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Germany 2Institute of Mathematics, Technische Universit at Berlin, Germany {zimmer,spiegel,pokutta}@zib.de |
| Pseudocode | Yes | Figure 2: Left: Sketch of the algorithm for a single phase and m = 3. Right: Pseudocode for SMS. |
| Open Source Code | Yes | For reproducibility, our implementation is available at github.com/ZIB-IOL/SMS. |
| Open Datasets | Yes | We evaluate our approach on well-known datasets for image recognition, semantic segmentation, and neural machine translation (NMT), including Image Net-1K (Russakovsky et al., 2015), CIFAR-10/100 (Krizhevsky et al., 2009), Celeb-A (Liu et al., 2015), City Scapes (Cordts et al., 2016), WMT16 DE-EN (Bojar et al., 2016) |
| Dataset Splits | Yes | For validation, we use 10% of the training data. |
| Hardware Specification | No | The paper mentions 'FLOPs are computed using a single test batch' but does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory details). |
| Software Dependencies | No | The paper mentions using SGD as an optimizer and adapting code from the Shrink Bench framework, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Table 3 shows the exact pretraining settings for each datasetarchitecture pair, reporting the number of epochs used for pretraining, the batch size, weight decay as well as the learning rate used. The exact retraining hyperparameters are specified explicitly in the descriptions of each experiment or in the corresponding subsection in Appendix B. |