reproducibilityindex.ai

Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

Authors: Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Richard G. Baraniuk, Zhangyang Wang, Yingyan Lin

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets and the effectiveness of mask distance in efﬁciently identifying them; and 2) that the proposed efﬁcient training via EB tickets can achieve up to 5.8 10.7 energy savings while maintaining comparable or even better accuracy as compared to the most competitive state-ofthe-art training methods, demonstrating a promising and easily adopted method for tackling the often cost-prohibitive deep network training.
Researcher Affiliation	Academia	Department of Electrical and Computer Engineering Rice University Houston, TX 77005, USA {hy34, cl114, px5, yf22, yw68, yingyan.lin, richb}@rice.edu Xiaohan Chen & Zhangyang Wang Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA {chernxh, atlaswang}@tamu.edu
Pseudocode	Yes	Algorithm 1: The Algorithm for Searching EB Tickets
Open Source Code	Yes	Codes available at https://github.com/RICE-EIC/Early-Bird-Tickets
Open Datasets	Yes	We perform ablation simulations using two representative deep models: VGG16 (Simonyan & Zisserman, 2014) and pre-activation residual networks-101 (Pre Res Net101) (He et al., 2016b), on two popular datasets: CIFAR-10 and CIFAR-100. ... We consider training the VGG16 and Pre Res Net101 models on both CIFAR10/100 and Image Net datasets...
Dataset Splits	No	The paper mentions 'minimum validation loss' but does not provide specific details on the validation dataset split (e.g., percentages or sample counts).
Hardware Specification	Yes	All the energy consumption of full-precision models are obtained by training the corresponding models in an embedded GPU (NVIDIA JETSON TX2).
Software Dependencies	No	The paper does not provide specific version numbers for any software components or libraries used (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	For drawing the tickets, we adopt a standard training protocol (Liu et al., 2018b) for both CIFAR-10 and CIFAR-100: the training takes 160 epochs in total and the batch size of 256; the initial learning rate is set to 0.1, and is divided by 10 at the 80th and 120th epochs, respectively; the SGD solver is adopted with a momentum of 0.9 and a weight decay of 10 4. For retraining the tickets, we keep the same setting by default.