Any-Precision Deep Neural Networks

Authors: Haichao Yu, Haoxiang Li, Humphrey Shi, Thomas S. Huang, Gang Hua10763-10771

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement the whole framework in Py Torch (Paszke et al. 2017). On Cifar-10, we train Alex Net, Mobile Net V2 and Resnet-20 models for 400 epochs with initial learning rate 0.001 and decayed by 0.1 at epochs {150, 250, 350}. On SVHN, the 8-layer CNN named CNN-8 and Resnet-8 models are trained for 100 epochs with initial learning rate 0.001 and decayed by 0.1 at epochs {50, 75, 90}. We combine the training and extra training data on SVHN as our training dataset. All models on Cifar-10 and SVHN are optimized with the Adam optimizer (Kingma and Ba 2014) without weight decay. On Image Net, we train Resnet-18 and Resnet50 dedicated models for 120 epochs with initial learning rate 0.1 decayed by 0.1 at epochs {30, 60, 85, 95, 105} with SGD optimizer. For any-precision model, we trained 80 epochs with initial learning 0.3 decayed by 0.1 at epochs {45, 60, 70}.
Researcher Affiliation Collaboration Haichao Yu1, Haoxiang Li2, Humphrey Shi1,3, Thomas S. Huang1, Gang Hua2 1UIUC, 2Wormpex AI Research, 3University of Oregon {haichao3, hshi10, t-huang1}@illinois.edu, haoxiang.li@bianlifeng.com, ganghua@gmail.com
Pseudocode Yes Algorithm 1 Training of the proposed any-precision DNN Require: Given candidate bit-widths P {nk}K k=1 1: Initialize the model M with floating-value parameters 2: Initialize K Batch Norm layers: ΦK k=1 3: for t = 1, ..., Titers do 4: Sample data batch (x, y) from train set Dtrain 5: for np in P do 6: Set quantization bit-width N np 7: Feed-forward pass: ynp M(x) 8: Set Batch Norm layers: M.replace(Φp) 9: L L + loss(ynp, y) 10: end for 11: Back-propagate to update network parameters 12: end for
Open Source Code Yes Our code is released at https://github.com/SHILabs/Any-Precision-DNNs.
Open Datasets Yes The datasets include Cifar-10 (Krizhevsky, Hinton et al. 2009), Street View House Numbers (SVHN) (Netzer et al. 2011), and Image Net (Deng et al. 2009). We also evaluate our method on the image segmentation task to demonstrate its generalization. We train Deeplab V3 (Chen et al. 2017) with Resnet50 on Pascal VOC 2012 segmentation dataset (Everingham et al. 2015) with SBD dataset (Hariharan et al. 2011) as groundtruth augmentation.
Dataset Splits No The paper mentions training data and test data, but does not explicitly provide details about specific training/validation/test dataset splits (e.g., percentages or sample counts for each split) for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments.
Software Dependencies No The paper mentions implementing the framework in Py Torch but does not specify a version number for Py Torch or any other software dependencies crucial for replication.
Experiment Setup Yes On Cifar-10, we train Alex Net, Mobile Net V2 and Resnet-20 models for 400 epochs with initial learning rate 0.001 and decayed by 0.1 at epochs {150, 250, 350}. On SVHN, the 8-layer CNN named CNN-8 and Resnet-8 models are trained for 100 epochs with initial learning rate 0.001 and decayed by 0.1 at epochs {50, 75, 90}. We combine the training and extra training data on SVHN as our training dataset. All models on Cifar-10 and SVHN are optimized with the Adam optimizer (Kingma and Ba 2014) without weight decay. On Image Net, we train Resnet-18 and Resnet50 dedicated models for 120 epochs with initial learning rate 0.1 decayed by 0.1 at epochs {30, 60, 85, 95, 105} with SGD optimizer. For any-precision model, we trained 80 epochs with initial learning 0.3 decayed by 0.1 at epochs {45, 60, 70}. For all models, following Zhou et al. (Zhou et al. 2016) we keep first and last layer real-valued. In training, we train the networks with bit-width candidates {1, 2, 4, 8, 32}.