Winning the Lottery Ahead of Time: Efficient Early Network Pruning
Authors: John Rachwan, Daniel Zügner, Bertrand Charpentier, Simon Geisler, Morgane Ayle, Stephan Günnemann
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that Early Cro P outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). Early Cro P leads to accuracy comparable to dense training while outperforming pruning baselines. |
| Researcher Affiliation | Academia | 1Technical University Munich, Germany. |
| Pseudocode | Yes | Algorithm 1 Early-Cro P |
| Open Source Code | Yes | The code and further supplementary material is available online1. www.cs.cit.tum.de/daml/early-crop/ |
| Open Datasets | Yes | The datasets used for Image Classification are the common public benchmarks CIFAR10 (Krizhevsky & Hinton, 2009), CIFAR100 (Krizhevsky & Hinton, 2009), and Tiny-Imagenet (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions training models for a certain number of epochs and reporting train and test accuracy, but it does not specify explicit validation splits (e.g., percentages like 80/10/10) or reference predefined validation sets used for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | We use a cloud instance of GTX 1080 TIs for all experiments. [...] We introduce the V100 16GB GPU (2.48$/h) and the V100 32GB GPU (4.96$/h). |
| Software Dependencies | No | For all experiments, we use the ADAM(Kingma & Ba, 2015) optimizer and a learning rate of 2e 3. The One Cycle Learning Rate scheduler is used to train all models except VGG16. The batch size used for CIFAR10 and CIFAR100 experiments is 256 while for Tiny-Imagenet it is 128. [...] We use the CUDA time measurement tool (Paszke et al., 2019). [...] We estimate CO2 emissions in g using Code Carbon emissions tracker (Schmidt et al., 2021). The paper names software like PyTorch and Code Carbon, and an optimizer (ADAM), but does not provide specific version numbers for these libraries or tools. |
| Experiment Setup | Yes | For all experiments, we use the ADAM(Kingma & Ba, 2015) optimizer and a learning rate of 2e 3. The One Cycle Learning Rate scheduler is used to train all models except VGG16. The batch size used for CIFAR10 and CIFAR100 experiments is 256 while for Tiny-Imagenet it is 128. All sparse models are allowed to train the same amount of epochs (80) to converge which, except for LTR, includes the number of epochs required to extract the sparse model. |