Comparing Rewinding and Fine-tuning in Neural Network Pruning

Authors: Alex Renda, Jonathan Frankle, Michael Carbin

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study neural network pruning on a variety of standard architectures for image classification and machine translation. Specifically, we consider Res Net-56 (He et al., 2016) for CIFAR-10 (Krizhevsky, 2009), Res Net-34 and Res Net-50 (He et al., 2016) for Image Net (Russakovsky et al., 2015), and GNMT (Wu et al., 2016) for WMT16 EN-DE.
Researcher Affiliation Academia Alex Renda MIT CSAIL renda@csail.mit.edu Jonathan Frankle MIT CSAIL jfrankle@csail.mit.edu Michael Carbin MIT CSAIL mcarbin@csail.mit.edu
Pseudocode Yes Algorithm 1 Our pruning algorithm
Open Source Code Yes Our implementation and the data from the experiments in this paper are available at: https://github.com/lottery-ticket/rewinding-iclr20-public
Open Datasets Yes We study neural network pruning on a variety of standard architectures for image classification and machine translation. Specifically, we consider Res Net-56 (He et al., 2016) for CIFAR-10 (Krizhevsky, 2009), Res Net-34 and Res Net-50 (He et al., 2016) for Image Net (Russakovsky et al., 2015), and GNMT (Wu et al., 2016) for WMT16 EN-DE.
Dataset Splits Yes For vision networks, we use 20% of the original test set, selected at random, as the validation set; the remainder of the original test set is used to report test accuracies. For WMT16 EN-DE, we use newstest2014 as the validation set (following Wu et al., 2016), and newstest2015 as the test set (following Zhu & Gupta, 2018).
Hardware Specification Yes We gratefully acknowledge the support of Google, which provided us with access to the TPU resources necessary to conduct experiments on Image Net and WMT through the Tensor Flow Research Cloud. In particular, we express our gratitude to Zak Stone. We gratefully acknowledge the support of IBM, which provided us with access to the GPU resources necessary to conduct experiments on CIFAR-10 through the MIT-IBM Watson AI Lab.
Software Dependencies No The paper mentions optimizers like Nesterov SGD and Adam, and refers to TensorFlow through 'Tensor Flow Research Cloud' and provided links to GitHub repositories for some models. However, it does not specify version numbers for any software components or libraries, which is required for reproducibility.
Experiment Setup Yes Table 1: Networks, datasets, and hyperparameters. We use standard implementations available online and standard hyperparameters. All accuracies are in line with baselines reported for these networks (Liu et al., 2019; He et al., 2018; Gale et al., 2019; Wu et al., 2016; Zhu & Gupta, 2018). It details Optimizer, Learning rate (with schedule), Batch size, Weight decay, and Epochs for each network/dataset.