Training and Inference with Integers in Deep Neural Networks
Authors: Shuang Wu, Guoqi Li, Feng Chen, Luping Shi
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed framework is evaluated on MNIST, CIFAR10, SVHN, Image Net datasets. Comparing to those who only discretize weights and activations at inference time, it has comparable accuracy and can further alleviate overfitting, indicating some type of regularization. WAGE produces pure bidirectional low-precision integer dataflow for DNNs, which can be applied for training and inference in dedicated hardware neatly. |
| Researcher Affiliation | Academia | Shuang Wu1, Guoqi Li1, Feng Chen2, Luping Shi1 1Department of Precision Instrument 2Department of Automation Center for Brain Inspired Computing Research Beijing Innovation Center for Future Chip Tsinghua University {lpshi,chenfeng}@mail.tsinghua.edu.cn |
| Pseudocode | Yes | A ALGORITHM We assume that network structures are defined and initialized with Equation 5. The annotations after pseudo code are potential corresponding operations for implementation in a fixed-point dataflow. Algorithm 1 Training an I-layer net with WAGE method on floating-point-based or integer-based device. Weights, activations, gradients and errors are quantized according to Equations 6–12. |
| Open Source Code | Yes | We publish the code on Git Hub1. 1https://github.com/boluoweifenda/WAGE |
| Open Datasets | Yes | The proposed framework is evaluated on MNIST, CIFAR10, SVHN, Image Net datasets. Comparing to those who only discretize weights and activations at inference time, it has comparable accuracy and can further alleviate overfitting, indicating some type of regularization. Our method is evaluated on MNIST, SVHN, CIFAR10 and ILSVRC12 (Russakovsky et al., 2015) and Table 1 shows the comparison results. |
| Dataset Splits | Yes | For CIFAR10 dataset, we follow the data augmentation in Lee et al. (2015) for training: 4 pixels are padded on each side, and a 32 × 32 patch is randomly cropped from the padded image or its horizontal flip. For testing, only single view of the original 32 × 32 image is evaluated. ... For testing, the single center crop in validation set is evaluated. |
| Hardware Specification | No | The paper makes a generic reference to "floating-point hardware like GPU" but does not specify any particular GPU models, CPU models, or other hardware components used for running the experiments. It lacks specific details such as model numbers, memory, or processor types. |
| Software Dependencies | No | We first build the computation graph for a vanilla network, then insert quantization nodes in forward propagation and override gradients in backward propagation for each layer on Tensorflow (Abadi et al., 2016). The paper mentions TensorFlow but does not provide a specific version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | In this section, we set W-A-G-E bits to 2-8-8-8 as default for all layers in a CNN or MLP. The learning rate η in WAGE remains as 1 for the whole 100 epochs. ... The model is trained with mini-batch size of 128 and totally 300 epochs. Learning rate η is set to 8 and divided by 8 at epoch 200 and epoch 250. |