Comparing Rewinding and Fine-tuning in Neural Network Pruning
Authors: Alex Renda, Jonathan Frankle, Michael Carbin
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study neural network pruning on a variety of standard architectures for image classification and machine translation. Specifically, we consider Res Net-56 (He et al., 2016) for CIFAR-10 (Krizhevsky, 2009), Res Net-34 and Res Net-50 (He et al., 2016) for Image Net (Russakovsky et al., 2015), and GNMT (Wu et al., 2016) for WMT16 EN-DE. |
| Researcher Affiliation | Academia | Alex Renda MIT CSAIL renda@csail.mit.edu Jonathan Frankle MIT CSAIL jfrankle@csail.mit.edu Michael Carbin MIT CSAIL mcarbin@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1 Our pruning algorithm |
| Open Source Code | Yes | Our implementation and the data from the experiments in this paper are available at: https://github.com/lottery-ticket/rewinding-iclr20-public |
| Open Datasets | Yes | We study neural network pruning on a variety of standard architectures for image classification and machine translation. Specifically, we consider Res Net-56 (He et al., 2016) for CIFAR-10 (Krizhevsky, 2009), Res Net-34 and Res Net-50 (He et al., 2016) for Image Net (Russakovsky et al., 2015), and GNMT (Wu et al., 2016) for WMT16 EN-DE. |
| Dataset Splits | Yes | For vision networks, we use 20% of the original test set, selected at random, as the validation set; the remainder of the original test set is used to report test accuracies. For WMT16 EN-DE, we use newstest2014 as the validation set (following Wu et al., 2016), and newstest2015 as the test set (following Zhu & Gupta, 2018). |
| Hardware Specification | Yes | We gratefully acknowledge the support of Google, which provided us with access to the TPU resources necessary to conduct experiments on Image Net and WMT through the Tensor Flow Research Cloud. In particular, we express our gratitude to Zak Stone. We gratefully acknowledge the support of IBM, which provided us with access to the GPU resources necessary to conduct experiments on CIFAR-10 through the MIT-IBM Watson AI Lab. |
| Software Dependencies | No | The paper mentions optimizers like Nesterov SGD and Adam, and refers to TensorFlow through 'Tensor Flow Research Cloud' and provided links to GitHub repositories for some models. However, it does not specify version numbers for any software components or libraries, which is required for reproducibility. |
| Experiment Setup | Yes | Table 1: Networks, datasets, and hyperparameters. We use standard implementations available online and standard hyperparameters. All accuracies are in line with baselines reported for these networks (Liu et al., 2019; He et al., 2018; Gale et al., 2019; Wu et al., 2016; Zhu & Gupta, 2018). It details Optimizer, Learning rate (with schedule), Batch size, Weight decay, and Epochs for each network/dataset. |