Pruning’s Effect on Generalization Through the Lens of Training and Regularization

Authors: Tian Jin, Michael Carbin, Dan Roy, Jonathan Frankle, Gintare Karolina Dziugaite

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that both factors are essential to fully explaining pruning s impact on generalization. We use standard architectures: Le Net [32], VGG-16 [56], Res Net-20, Res Net-32 and Res Net-50 [24], and train on benchmarks (MNIST, CIFAR-10, CIFAR-100, Image Net)
Researcher Affiliation Collaboration Tian Jin1 Michael Carbin1 Daniel M. Roy2 Jonathan Frankle3 Gintare Karolina Dziugaite4 1MIT 2University of Toronto, Vector Institute 3Mosaic ML 4Google Research, Brain Team
Pseudocode No The paper describes the iterative magnitude pruning algorithm and its components in prose, but it does not include a formally structured pseudocode or algorithm block.
Open Source Code No The checklist question 3(a) 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?' is answered with '[No]'.
Open Datasets Yes We use standard architectures: Le Net [32], VGG-16 [56], Res Net-20, Res Net-32 and Res Net-50 [24], and train on benchmarks (MNIST, CIFAR-10, CIFAR-100, Image Net) using standard hyperparameter settings and standard cross-entropy loss function [13, 14, 66].
Dataset Splits No The paper mentions 'best validation error' and 'optimally sparse model' which implies the use of a validation set, but it does not provide specific details on the training, validation, and test data splits (e.g., percentages or sample counts).
Hardware Specification Yes We use Py Torch [50] on TPUs with Open LTH library [13].
Software Dependencies No The paper mentions 'Py Torch' and 'Open LTH library' as software used, but it does not provide specific version numbers for these components.
Experiment Setup Yes Following Frankle and Carbin [13], Frankle et al. [14], we set the t in IMP to t = 0 for MNIST-Le Net benchmark and t = 10 for the others. Appendix B shows further details.