reproducibilityindex.ai

On the Predictability of Pruning Across Scales

Authors: Jonathan S Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefﬁcients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density.
Researcher Affiliation	Academia	1MIT CSAIL. Correspondence to: Jonathan Rosenfeld <jonsr@csail.mit.edu>.
Pseudocode	Yes	For a formal statement of this pruning algorithm, see Appendix A.
Open Source Code	No	The paper does not provide a direct link to open-source code for its methodology, nor does it explicitly state that code will be released.
Open Datasets	Yes	In the main body of the paper, we study the image classiﬁcation tasks CIFAR-10 and Image Net. Our scaling law predicts the error when training with the entire dataset and smaller subsamples. ... To subsample a dataset to a size of n, we randomly select n of the training examples without regard to individual classes such that in expectation we preserve the original dataset distribution (we always retain the entire test set).
Dataset Splits	No	The paper mentions training data, subsamples, and retaining the test set. It describes training three replicates with different seeds. However, it does not explicitly describe a separate validation split or how it was used in the experimental setup.
Hardware Specification	No	The paper mentions 'TPU resources' and 'GPU resources' provided by Google and IBM respectively, but it does not specify the exact models or configurations of these hardware components (e.g., specific TPU versions like v2/v3, or GPU models like V100/A100).
Software Dependencies	No	The paper does not provide specific software names with version numbers that would be necessary for reproduction.
Experiment Setup	Yes	We study iterative magnitude pruning (IMP)... IMP prunes by removing a fraction typically 20%, as we do here of individual weights with the lowest magnitudes... For IMP, we use a practice called weight rewinding... in which the values of unpruned weights are rewound to their values earlier in training (in our case, epoch 10) and the training process is repeated from there to completion. ... To achieve density levels below 80%, this process is repeated iteratively pruning by 20%, rewinding, and retraining until a desired density level is reached. ... See Appendix B for full details on architectures and hyperparameters.