E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

Authors: Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training.
Researcher Affiliation Academia Yue Wang , , Ziyu Jiang , , Xiaohan Chen , , Pengfei Xu , Yang Zhao , Yingyan Lin and Zhangyang Wang Department of Computer Science and Engineering, Texas A&M University Department of Electrical and Computer Engineering, Rice University {jiangziyu, chernxh, atlaswang}@tamu.edu {yw68, px5, zy34, yingyan.lin}@rice.edu
Pseudocode No The proposed framework
Open Source Code No https://rtml.eiclab.net/?page_id=120
Open Datasets Yes Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100.
Dataset Splits No Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100. Common data augmentation methods (e.g., mirroring/shifting) are adopted, and data are normalized as in [60].
Hardware Specification Yes Specifically, unless otherwise specified, all the energy or energy savings are obtained through real measurements by training the corresponding models and datasets in a state-of-the-art FPGA [65], which is a digilent Zed Board Zynq-7000 ARM/FPGA So C Development Board.
Software Dependencies No Specifically, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations.
Experiment Setup Yes Specifically, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations. For experiments where PSG is used, the initial learning rate is adjusted to 0.03 as Sign SGD[20] suggested small learning rates to benefit convergence. For others, the learning rate is initially set to be 0.1 and then decayed by 10 at the 32k and 48k iterations, respectively.