Dynamic Model Pruning with Feedback

Authors: Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on CIFAR-10 and Image Net, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes. and 5 EXPERIMENTS
Researcher Affiliation Academia Tao Lin EPFL, Switzerland tao.lin@epfl.ch Sebastian U. Stich EPFL, Switzerland sebastian.stich@epfl.ch Luis Barba EPFL & ETH Zurich, Switzerland luis.barba@inf.ethz.ch Daniil Dmitriev EPFL, Switzerland daniil.dmitriev@epfl.ch Martin Jaggi EPFL, Switzerland martin.jaggi@epfl.ch
Pseudocode Yes A.1 ALGORITHM Algorithm 1 The detailed training procedure of DPF.
Open Source Code No The paper mentions adapting open-sourced code for baselines ('For all competitors, we adapted their open-sourced code and applied a consistent (and standard) training scheme over different methods to ensure a fair comparison.'), but does not provide an explicit statement or link for the open-source code of their proposed method (DPF).
Open Datasets Yes Datasets. We evaluated DPF on two image classification benchmarks: (1) CIFAR-10 (Krizhevsky & Hinton, 2009) (50K/10K training/test samples with 10 classes), and (2) Image Net (Russakovsky et al., 2015) (1.28M/50K training/validation samples with 1000 classes).
Dataset Splits Yes Datasets. We evaluated DPF on two image classification benchmarks: (1) CIFAR-10 (Krizhevsky & Hinton, 2009) (50K/10K training/test samples with 10 classes), and (2) Image Net (Russakovsky et al., 2015) (1.28M/50K training/validation samples with 1000 classes).
Hardware Specification Yes We implemented our DPF in Py Torch (Paszke et al., 2017). All experiments were run on NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup Yes Training schedules. For all competitors, we adapted their open-sourced code and applied a consistent (and standard) training scheme over different methods to ensure a fair comparison. Following the standard training setup for CIFAR-10, we trained Res Net-a for 300 epochs and decayed the learning rate by 10 when accessing 50% and 75% of the total training samples (He et al., 2016a; Huang et al., 2017); and we trained Wide Res Net-a-b as Zagoruyko & Komodakis (2016) for 200 epochs and decayed the learning rate by 5 when accessing 30%, 60% and 80% of the total training samples. For Image Net training, we used the training scheme in (Goyal et al., 2017) for 90 epochs and decayed learning rate by 10 at 30, 60, 80 epochs. For all datasets and models, we used mini-batch SGD with Nesterov momentum (factor 0.9) with fine-tuned learning rate for DPF. We reused the tuned (or recommended) hyperparameters for our baselines (DSR and SM), and fine-tuned the optimizer and learning rate for One-shot P+FT, Incremental and SNIP. The mini-batch size is fixed to 128 for CIFAR-10 and 1024 for Image Net regardless of datasets, models and methods.