Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations
Authors: Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, Zhiru Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that PG achieves excellent results on CNNs, including statically compressed mobile-friendly networks such as Shuffle Net. Compared to the state-of-the-art prediction-based quantization schemes, PG achieves the same or higher accuracy with 2.4 less compute on Image Net. PG furthermore applies to RNNs. Compared to 8-bit uniform quantization, PG obtains a 1.2% improvement in perplexity per word with 2.7 computational cost reduction on LSTM on the Penn Tree Bank dataset. |
| Researcher Affiliation | Academia | Yichi Zhang Cornell University yz2499@cornell.edu Ritchie Zhao Cornell University rz252@cornell.edu Weizhe Hua Cornell University wh399@cornell.edu Nayun Xu Cornell Tech nx38@cornell.edu G. Edward Suh Cornell University edward.suh@cornell.edu Zhiru Zhang Cornell University zhiruz@cornell.edu |
| Pseudocode | No | No pseudocode or algorithm blocks found. The paper describes the method using equations and explanatory text. |
| Open Source Code | No | We will release the source code on the author s website. |
| Open Datasets | Yes | We evaluate PG using Res Net-20 (He et al., 2016a) and Shift Net-20 (Wu et al., 2018a) on CIFAR10 (Krizhevsky & Hinton, 2009), and Shuffle Net V2 (Ma et al., 2018) on Image Net (Deng et al., 2009). We also test an LSTM model (Hochreiter & Schmidhuber, 1997) on the Penn Tree Bank (PTB) (Marcus et al., 1993) corpus. |
| Dataset Splits | No | The paper mentions training details but does not provide explicit dataset splits for train/validation/test. For example, "On CIFAR-10, the batch size is 128, and the models are trained for 200 epochs." does not specify the splits. |
| Hardware Specification | Yes | All experiments are conducted on Tensorflow (Abadi et al., 2016) with NVIDIA Ge Force 1080Ti GPUs. One of the Titan Xp GPUs used for this research was donated by the NVIDIA Corporation. Intel Xeon Silver 4114 CPU (2.20GHz). |
| Software Dependencies | No | All experiments are conducted on Tensorflow (Abadi et al., 2016) with NVIDIA Ge Force 1080Ti GPUs. We implement the SDDMM kernel in Python leveraging a high performance JIT compiler Numba (Lam et al., 2015). No version numbers for TensorFlow or Numba are specified. |
| Experiment Setup | Yes | On CIFAR-10, the batch size is 128, and the models are trained for 200 epochs. The initial learning rate is 0.1 and decays at epoch 100, 150, 200 by a factor of 0.1 (i.e., multiply learning rate by 0.1). On Image Net, the batch size is 512 and the models are trained for 120 epochs. The learning rate decays linearly from an initial value of 0.5 to 0. The number of hidden units in the LSTM cell is set to 300, and the number of layers is set to 1. The full bitwidth B... The prediction bitwidth Bhb... The penalty factor σ... The gating target δ... The coefficient α in the backward pass... |