BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Authors: Huanrui Yang, Lin Duan, Yiran Chen, Hai Li

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental BSQ achieves both higher accuracy and higher bit reduction on various model architectures on the CIFAR-10 and Image Net datasets comparing to previous methods.
Researcher Affiliation Academia Huanrui Yang, Lin Duan, Yiran Chen & Hai Li Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA {huanrui.yang, lin.duan, yiran.chen, hai.li}@duke.edu
Pseudocode No The paper describes algorithms and processes in text and uses figures to illustrate pipelines, but it does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets Yes We use Res Net-20 models on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... The CIFAR-10 dataset can be directly accessed through the dataset API provided in the torchvision python package. ...Res Net-50 and Inception-V3 models... are utilized for the experiments on the Image Net dataset (Russakovsky et al., 2015). The Image Net dataset... can be found at http://www.image-net.org/challenges/LSVRC/ 2012/nonpub-downloads.
Dataset Splits Yes We use all the data in the provided training set to train our model, and use the provided validation set to evaluate our model and report the testing accuracy.
Hardware Specification Yes All the training processes are done on a single TITAN XP GPU. ...Two TITAN RTX GPUs are used in parallel for the BSQ training and finetuning of both Res Net-50 and Inception-V3 models.
Software Dependencies No The paper mentions using 'torchvision python package' and following the 'official Py Torch Image Net example' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes The learning rate is set to 0.1 initially, and decayed by 0.1 at epoch 150, 250 and 325. ...The BSQ training is done for 350 epochs, with the first 250 epochs using learning rate 0.1 and the rest using learning rate 0.01. ...The finetuning is performed for 300 epochs with an initial learning rate 0.01 and the learning rate decay by 0.1 at epoch 150 and 250. ...All the training tasks are optimized with the SGD optimizer... with momentum 0.9 and weight decay 0.0001, and the batch size is set to 128.