Dynamic Model Pruning with Feedback
Authors: Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on CIFAR-10 and Image Net, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes. and 5 EXPERIMENTS |
| Researcher Affiliation | Academia | Tao Lin EPFL, Switzerland tao.lin@epfl.ch Sebastian U. Stich EPFL, Switzerland sebastian.stich@epfl.ch Luis Barba EPFL & ETH Zurich, Switzerland luis.barba@inf.ethz.ch Daniil Dmitriev EPFL, Switzerland daniil.dmitriev@epfl.ch Martin Jaggi EPFL, Switzerland martin.jaggi@epfl.ch |
| Pseudocode | Yes | A.1 ALGORITHM Algorithm 1 The detailed training procedure of DPF. |
| Open Source Code | No | The paper mentions adapting open-sourced code for baselines ('For all competitors, we adapted their open-sourced code and applied a consistent (and standard) training scheme over different methods to ensure a fair comparison.'), but does not provide an explicit statement or link for the open-source code of their proposed method (DPF). |
| Open Datasets | Yes | Datasets. We evaluated DPF on two image classification benchmarks: (1) CIFAR-10 (Krizhevsky & Hinton, 2009) (50K/10K training/test samples with 10 classes), and (2) Image Net (Russakovsky et al., 2015) (1.28M/50K training/validation samples with 1000 classes). |
| Dataset Splits | Yes | Datasets. We evaluated DPF on two image classification benchmarks: (1) CIFAR-10 (Krizhevsky & Hinton, 2009) (50K/10K training/test samples with 10 classes), and (2) Image Net (Russakovsky et al., 2015) (1.28M/50K training/validation samples with 1000 classes). |
| Hardware Specification | Yes | We implemented our DPF in Py Torch (Paszke et al., 2017). All experiments were run on NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Training schedules. For all competitors, we adapted their open-sourced code and applied a consistent (and standard) training scheme over different methods to ensure a fair comparison. Following the standard training setup for CIFAR-10, we trained Res Net-a for 300 epochs and decayed the learning rate by 10 when accessing 50% and 75% of the total training samples (He et al., 2016a; Huang et al., 2017); and we trained Wide Res Net-a-b as Zagoruyko & Komodakis (2016) for 200 epochs and decayed the learning rate by 5 when accessing 30%, 60% and 80% of the total training samples. For Image Net training, we used the training scheme in (Goyal et al., 2017) for 90 epochs and decayed learning rate by 10 at 30, 60, 80 epochs. For all datasets and models, we used mini-batch SGD with Nesterov momentum (factor 0.9) with fine-tuned learning rate for DPF. We reused the tuned (or recommended) hyperparameters for our baselines (DSR and SM), and fine-tuned the optimizer and learning rate for One-shot P+FT, Incremental and SNIP. The mini-batch size is fixed to 128 for CIFAR-10 and 1024 for Image Net regardless of datasets, models and methods. |