E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Authors: Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training. |
| Researcher Affiliation | Academia | Yue Wang , , Ziyu Jiang , , Xiaohan Chen , , Pengfei Xu , Yang Zhao , Yingyan Lin and Zhangyang Wang Department of Computer Science and Engineering, Texas A&M University Department of Electrical and Computer Engineering, Rice University {jiangziyu, chernxh, atlaswang}@tamu.edu {yw68, px5, zy34, yingyan.lin}@rice.edu |
| Pseudocode | No | The proposed framework |
| Open Source Code | No | https://rtml.eiclab.net/?page_id=120 |
| Open Datasets | Yes | Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100. |
| Dataset Splits | No | Datasets: We evaluate our proposed techniques on two datasets: CIFAR-10 and CIFAR-100. Common data augmentation methods (e.g., mirroring/shifting) are adopted, and data are normalized as in [60]. |
| Hardware Specification | Yes | Specifically, unless otherwise specified, all the energy or energy savings are obtained through real measurements by training the corresponding models and datasets in a state-of-the-art FPGA [65], which is a digilent Zed Board Zynq-7000 ARM/FPGA So C Development Board. |
| Software Dependencies | No | Specifically, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations. |
| Experiment Setup | Yes | Specifically, we use an SGD with a momentum of 0.9 and a weight decaying factor of 0.0001, and the initialization introduced in [63]. Models are trained for 64k iterations. For experiments where PSG is used, the initial learning rate is adjusted to 0.03 as Sign SGD[20] suggested small learning rates to benefit convergence. For others, the learning rate is initially set to be 0.1 and then decayed by 10 at the 32k and 48k iterations, respectively. |