CPT: Efficient Deep Neural Network Training via Cyclic Precision

Authors: Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT s effectiveness is consistent across various models/tasks (including classification and language modeling).
Researcher Affiliation Collaboration Yonggan Fu, Han Guo, Xin Yang, Yining Ding & Yingyan Lin Department of Electrical and Computer Engineering Rice University {yf22, hg31, xy33, yd31, yingyan.lin}@rice.edu Meng Li & Vikas Chandra Facebook Inc {meng.li, vchandra}@fb.com
Pseudocode No The paper provides mathematical formulas and descriptions of procedures, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code Yes Our codes are available at: https://github.com/RICE-EIC/CPT.
Open Datasets Yes We consider eleven models... and five tasks (including CIFAR-10/100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), Wiki Text-103 (Merity et al., 2016), and Penn Treebank (PTB) (Marcus et al., 1993)).
Dataset Splits Yes We follow the standard training setting in all the experiments. In particular, for classification tasks, we follow SOTA settings in (Wang et al., 2018b) for CIFAR-10/100 and (He et al., 2016) for Image Net experiments, respectively; and for language modeling tasks, we follow (Baevski & Auli, 2018) for Transformer on Wiki Text-103 and (Merity et al., 2017) for LSTM on PTB.
Hardware Specification Yes Specifically, we employ the Vivado HLx design flow to implement FPGA-based accelerators on a Xilinx development board called ZC706 (Xilinx).
Software Dependencies No The paper mentions using "Vivado HLx design flow" but does not provide a specific version number for this or any other software dependency.
Experiment Setup Yes In particular, for classification tasks, we follow SOTA settings in (Wang et al., 2018b) for CIFAR-10/100 and (He et al., 2016) for Image Net experiments, respectively; and for language modeling tasks, we follow (Baevski & Auli, 2018) for Transformer on Wiki Text-103 and (Merity et al., 2017) for LSTM on PTB. The lower precision bounds in all the experiments are set using the PRT in Sec.3.3 and the upper bound is the same as the precision of the corresponding static precision baselines. We only apply CPT to the weights and activations (together annotated as FW) and use static precision for the errors and gradients (together annotated as BW)... The total number of periodic precision cycles, i.e., N in Sec.3.3, for all the experiments is fixed to be 32.