CPT: Efficient Deep Neural Network Training via Cyclic Precision
Authors: Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT s effectiveness is consistent across various models/tasks (including classification and language modeling). |
| Researcher Affiliation | Collaboration | Yonggan Fu, Han Guo, Xin Yang, Yining Ding & Yingyan Lin Department of Electrical and Computer Engineering Rice University {yf22, hg31, xy33, yd31, yingyan.lin}@rice.edu Meng Li & Vikas Chandra Facebook Inc {meng.li, vchandra}@fb.com |
| Pseudocode | No | The paper provides mathematical formulas and descriptions of procedures, but it does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | Our codes are available at: https://github.com/RICE-EIC/CPT. |
| Open Datasets | Yes | We consider eleven models... and five tasks (including CIFAR-10/100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), Wiki Text-103 (Merity et al., 2016), and Penn Treebank (PTB) (Marcus et al., 1993)). |
| Dataset Splits | Yes | We follow the standard training setting in all the experiments. In particular, for classification tasks, we follow SOTA settings in (Wang et al., 2018b) for CIFAR-10/100 and (He et al., 2016) for Image Net experiments, respectively; and for language modeling tasks, we follow (Baevski & Auli, 2018) for Transformer on Wiki Text-103 and (Merity et al., 2017) for LSTM on PTB. |
| Hardware Specification | Yes | Specifically, we employ the Vivado HLx design flow to implement FPGA-based accelerators on a Xilinx development board called ZC706 (Xilinx). |
| Software Dependencies | No | The paper mentions using "Vivado HLx design flow" but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | In particular, for classification tasks, we follow SOTA settings in (Wang et al., 2018b) for CIFAR-10/100 and (He et al., 2016) for Image Net experiments, respectively; and for language modeling tasks, we follow (Baevski & Auli, 2018) for Transformer on Wiki Text-103 and (Merity et al., 2017) for LSTM on PTB. The lower precision bounds in all the experiments are set using the PRT in Sec.3.3 and the upper bound is the same as the precision of the corresponding static precision baselines. We only apply CPT to the weights and activations (together annotated as FW) and use static precision for the errors and gradients (together annotated as BW)... The total number of periodic precision cycles, i.e., N in Sec.3.3, for all the experiments is fixed to be 32. |