Is Integer Arithmetic Enough for Deep Learning Training?

Authors: Alireza Ghaffari, Marzieh S. Tahaei, Mohammadreza Tayaranian, Masoud Asgharian, Vahid Partovi Nia

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical and mathematical results reveal that integer arithmetic seems to be enough to train deep learning models. Our experimental results show that our proposed method is effective in a wide variety of tasks such as classification (including vision transformers), object detection, and semantic segmentation.
Researcher Affiliation Collaboration Alireza Ghaffari1 Marzieh S. Tahaei1 Mohammadreza Tayaranian1 Masoud Asgharian2 Vahid Partovi Nia1 1 Huawei Noah s Ark Lab, Montreal Research Center 2 Department of Mathematics and Statistics, Mc Gill University {alireza.ghaffari, marzieh.tahaei, mohammadreza.tayaranian}@huawei.com vahid.partovinia@huawei.com masoud.asgharian2@mcgill.ca
Pseudocode No The paper includes figures illustrating the process, but it does not contain a clearly labeled section for 'Pseudocode' or 'Algorithm', nor does it present structured code blocks.
Open Source Code No The code is proprietary, however we will provide them upon publication.
Open Datasets Yes Image classification: Table 1 reports the experimental results of our proposed integer training method on the conventional vision classification models. In this set of models, we used int8 linear layer, int8 convolutional layer, int8 batch-norm layer, and int16 SGD to form a fully integer training pipeline. ... smaller vision models such as Res Net18 and Mobile Net V2 ... on Image Net). Vision transformer: We validated the applicability of our proposed integer training on the original vision transformer model, notably Vi T-B-16-224 [15]. ... fine-tune the model on CIFAR10. Semantic segmentation: ... Deep Lab V1/V23 with Res Net-101 ... PASCAL VOC-2012 [17] and MS COCO 10K [18]. Object detection: ... MMDetection[22] toolbox ... MS COCO 10K[18], PASCAL VOC-2007[23], VOC-2012[17], and Cityscapes[24] datasets.
Dataset Splits Yes For the Image Net classification experiments, we used the standard training and validation set of Image Net with the standard split. For the CIFAR10 and CIFAR100 classification, we used the training and test set of the dataset with the standard split.
Hardware Specification Yes For Image Net experiments, we used 8 NVIDIA Tesla V100 GPUs with 32 GB of memory, and for smaller datasets, we used 2 NVIDIA Tesla V100 GPUs with 32 GB of memory.
Software Dependencies Yes Our experiments are based on Pytorch framework (version 1.10.0), and we have used CUDA (version 11.3) and CuDNN (version 8.2.0.100).
Experiment Setup Yes For the Image Net classification experiments, we used the standard training and validation set of Image Net with the standard split. We used Stochastic Gradient Descent (SGD) with momentum 0.9 and weight decay 1e-4. The initial learning rate is 0.1, and it is reduced by a factor of 10 at epochs 30, 60, and 90. We trained models for 100 epochs with a batch size of 256.