Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
Authors: Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Richard G. Baraniuk, Zhangyang Wang, Yingyan Lin
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 5.8 10.7 energy savings while maintaining comparable or even better accuracy as compared to the most competitive state-ofthe-art training methods, demonstrating a promising and easily adopted method for tackling the often cost-prohibitive deep network training. |
| Researcher Affiliation | Academia | Department of Electrical and Computer Engineering Rice University Houston, TX 77005, USA {hy34, cl114, px5, yf22, yw68, yingyan.lin, richb}@rice.edu Xiaohan Chen & Zhangyang Wang Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA {chernxh, atlaswang}@tamu.edu |
| Pseudocode | Yes | Algorithm 1: The Algorithm for Searching EB Tickets |
| Open Source Code | Yes | Codes available at https://github.com/RICE-EIC/Early-Bird-Tickets |
| Open Datasets | Yes | We perform ablation simulations using two representative deep models: VGG16 (Simonyan & Zisserman, 2014) and pre-activation residual networks-101 (Pre Res Net101) (He et al., 2016b), on two popular datasets: CIFAR-10 and CIFAR-100. ... We consider training the VGG16 and Pre Res Net101 models on both CIFAR10/100 and Image Net datasets... |
| Dataset Splits | No | The paper mentions 'minimum validation loss' but does not provide specific details on the validation dataset split (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All the energy consumption of full-precision models are obtained by training the corresponding models in an embedded GPU (NVIDIA JETSON TX2). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | For drawing the tickets, we adopt a standard training protocol (Liu et al., 2018b) for both CIFAR-10 and CIFAR-100: the training takes 160 epochs in total and the batch size of 256; the initial learning rate is set to 0.1, and is divided by 10 at the 80th and 120th epochs, respectively; the SGD solver is adopted with a momentum of 0.9 and a weight decay of 10 4. For retraining the tickets, we keep the same setting by default. |