Winning the Lottery Ahead of Time: Efficient Early Network Pruning

Authors: John Rachwan, Daniel Zügner, Bertrand Charpentier, Simon Geisler, Morgane Ayle, Stephan Günnemann

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that Early Cro P outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). Early Cro P leads to accuracy comparable to dense training while outperforming pruning baselines.
Researcher Affiliation Academia 1Technical University Munich, Germany.
Pseudocode Yes Algorithm 1 Early-Cro P
Open Source Code Yes The code and further supplementary material is available online1. www.cs.cit.tum.de/daml/early-crop/
Open Datasets Yes The datasets used for Image Classification are the common public benchmarks CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR100 (Krizhevsky & Hinton, 2009), and Tiny-Imagenet (Deng et al., 2009).
Dataset Splits No The paper mentions training models for a certain number of epochs and reporting train and test accuracy, but it does not specify explicit validation splits (e.g., percentages like 80/10/10) or reference predefined validation sets used for hyperparameter tuning or early stopping.
Hardware Specification Yes We use a cloud instance of GTX 1080 TIs for all experiments. [...] We introduce the V100 16GB GPU (2.48$/h) and the V100 32GB GPU (4.96$/h).
Software Dependencies No For all experiments, we use the ADAM(Kingma & Ba, 2015) optimizer and a learning rate of 2e 3. The One Cycle Learning Rate scheduler is used to train all models except VGG16. The batch size used for CIFAR10 and CIFAR100 experiments is 256 while for Tiny-Imagenet it is 128. [...] We use the CUDA time measurement tool (Paszke et al., 2019). [...] We estimate CO2 emissions in g using Code Carbon emissions tracker (Schmidt et al., 2021). The paper names software like PyTorch and Code Carbon, and an optimizer (ADAM), but does not provide specific version numbers for these libraries or tools.
Experiment Setup Yes For all experiments, we use the ADAM(Kingma & Ba, 2015) optimizer and a learning rate of 2e 3. The One Cycle Learning Rate scheduler is used to train all models except VGG16. The batch size used for CIFAR10 and CIFAR100 experiments is 256 while for Tiny-Imagenet it is 128. All sparse models are allowed to train the same amount of epochs (80) to converge which, except for LTR, includes the number of epochs required to extract the sparse model.