The Generalization-Stability Tradeoff In Neural Network Pruning

Authors: Brian Bartoldson, Ari Morcos, Adrian Barbu, Gordon Erlebacher

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use VGG11 [41] with batch normalization and its dense layers replaced by a single dense layer, Res Net18, Res Net20, and Res Net56 [42]. Except where noted in Section 3.2, we train models with Adam [43], which was more helpful than SGD for recovering accuracy after pruning (perhaps related to the observation that recovery from pruning is harder when learning rates are low [44]). We use CIFAR10 data [45] without data augmentation, except in Section 3.2 where we note use of data augmentation (random crops and horizontal flips) and Appendix F where we use CIFAR100 with data augmentation to mimic the setup in [10].
Researcher Affiliation Collaboration Brian R. Bartoldson Lawrence Livermore National Laboratory bartoldson@llnl.gov; Ari S. Morcos Facebook AI Research arimorcos@fb.com; Adrian Barbu Florida State University abarbu@stat.fsu.edu; Gordon Erlebacher Florida State University gerlebacher@fsu.edu
Pseudocode No The paper describes procedures in text, such as the pruning algorithm's instability computation in Figure 1, but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/bbartoldson/Generalization Stability Tradeoff.
Open Datasets Yes We use CIFAR10 data [45] without data augmentation, except in Section 3.2 where we note use of data augmentation (random crops and horizontal flips) and Appendix F where we use CIFAR100 with data augmentation to mimic the setup in [10]. [45] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). URL http://www.cs.toronto.edu/~kriz/cifar.html.
Dataset Splits No The paper mentions 'train accuracy' and 'test accuracy' throughout the experimental sections and discussion of generalization gaps, indicating that data splitting was performed. However, it does not explicitly state the percentages, counts, or methodology used for these splits (e.g., 80/10/10 split, specific sample counts for each set, or a reference to predefined splits from cited datasets).
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components like 'Adam [43]' for optimization and cites 'pytorch [61]' in the references, but it does not specify version numbers for any software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup Yes We set batch size to 128. Except where noted in Section 3.2, we train models with Adam [43]... For each layer of a model, the pruning schedule specifies epochs on which pruning iterations occur (for example, two configurations in Figure 2 prune the last VGG11 convolutional layer every 40 epochs between epochs 7 and 247). Our VGG11 and Res Net18 experiments prune just the last four convolutional layers with total pruning percentages {30%, 30%, 30%, 90%} and {25%, 40%, 25%, 95%}, respectively.