reproducibilityindex.ai

Training and Inference with Integers in Deep Neural Networks

Authors: Shuang Wu, Guoqi Li, Feng Chen, Luping Shi

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed framework is evaluated on MNIST, CIFAR10, SVHN, Image Net datasets. Comparing to those who only discretize weights and activations at inference time, it has comparable accuracy and can further alleviate overﬁtting, indicating some type of regularization. WAGE produces pure bidirectional low-precision integer dataﬂow for DNNs, which can be applied for training and inference in dedicated hardware neatly.
Researcher Affiliation	Academia	Shuang Wu1, Guoqi Li1, Feng Chen2, Luping Shi1 1Department of Precision Instrument 2Department of Automation Center for Brain Inspired Computing Research Beijing Innovation Center for Future Chip Tsinghua University {lpshi,chenfeng}@mail.tsinghua.edu.cn
Pseudocode	Yes	A ALGORITHM We assume that network structures are deﬁned and initialized with Equation 5. The annotations after pseudo code are potential corresponding operations for implementation in a ﬁxed-point dataﬂow. Algorithm 1 Training an I-layer net with WAGE method on ﬂoating-point-based or integer-based device. Weights, activations, gradients and errors are quantized according to Equations 6–12.
Open Source Code	Yes	We publish the code on Git Hub1. 1https://github.com/boluoweifenda/WAGE
Open Datasets	Yes	The proposed framework is evaluated on MNIST, CIFAR10, SVHN, Image Net datasets. Comparing to those who only discretize weights and activations at inference time, it has comparable accuracy and can further alleviate overﬁtting, indicating some type of regularization. Our method is evaluated on MNIST, SVHN, CIFAR10 and ILSVRC12 (Russakovsky et al., 2015) and Table 1 shows the comparison results.
Dataset Splits	Yes	For CIFAR10 dataset, we follow the data augmentation in Lee et al. (2015) for training: 4 pixels are padded on each side, and a 32 × 32 patch is randomly cropped from the padded image or its horizontal ﬂip. For testing, only single view of the original 32 × 32 image is evaluated. ... For testing, the single center crop in validation set is evaluated.
Hardware Specification	No	The paper makes a generic reference to "ﬂoating-point hardware like GPU" but does not specify any particular GPU models, CPU models, or other hardware components used for running the experiments. It lacks specific details such as model numbers, memory, or processor types.
Software Dependencies	No	We ﬁrst build the computation graph for a vanilla network, then insert quantization nodes in forward propagation and override gradients in backward propagation for each layer on Tensorﬂow (Abadi et al., 2016). The paper mentions TensorFlow but does not provide a specific version number or any other software dependencies with their versions.
Experiment Setup	Yes	In this section, we set W-A-G-E bits to 2-8-8-8 as default for all layers in a CNN or MLP. The learning rate η in WAGE remains as 1 for the whole 100 epochs. ... The model is trained with mini-batch size of 128 and totally 300 epochs. Learning rate η is set to 8 and divided by 8 at epoch 200 and epoch 250.