reproducibilityindex.ai

E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

Authors: Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulations and ablation studies, with real energy measurements from an FPGA board, conﬁrm the superiority of our proposed strategies and demonstrate remarkable energy savings for training.
Researcher Affiliation	Academia	Yue Wang , , Ziyu Jiang , , Xiaohan Chen , , Pengfei Xu , Yang Zhao , Yingyan Lin and Zhangyang Wang Department of Computer Science and Engineering, Texas A&M University Department of Electrical and Computer Engineering, Rice University {jiangziyu, chernxh, atlaswang}@tamu.edu {yw68, px5, zy34, yingyan.lin}@rice.edu
Pseudocode	No	The proposed framework
Open Source Code	No	https://rtml.eiclab.net/?page_id=120
Open Datasets	Yes	Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100.
Dataset Splits	No	Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100. Common data augmentation methods (e.g., mirroring/shifting) are adopted, and data are normalized as in [60].
Hardware Specification	Yes	Speciﬁcally, unless otherwise speciﬁed, all the energy or energy savings are obtained through real measurements by training the corresponding models and datasets in a state-of-the-art FPGA [65], which is a digilent Zed Board Zynq-7000 ARM/FPGA So C Development Board.
Software Dependencies	No	Speciﬁcally, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations.
Experiment Setup	Yes	Speciﬁcally, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations. For experiments where PSG is used, the initial learning rate is adjusted to 0.03 as Sign SGD[20] suggested small learning rates to beneﬁt convergence. For others, the learning rate is initially set to be 0.1 and then decayed by 10 at the 32k and 48k iterations, respectively.