BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization
Authors: Huanrui Yang, Lin Duan, Yiran Chen, Hai Li
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | BSQ achieves both higher accuracy and higher bit reduction on various model architectures on the CIFAR-10 and Image Net datasets comparing to previous methods. |
| Researcher Affiliation | Academia | Huanrui Yang, Lin Duan, Yiran Chen & Hai Li Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA {huanrui.yang, lin.duan, yiran.chen, hai.li}@duke.edu |
| Pseudocode | No | The paper describes algorithms and processes in text and uses figures to illustrate pipelines, but it does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We use Res Net-20 models on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... The CIFAR-10 dataset can be directly accessed through the dataset API provided in the torchvision python package. ...Res Net-50 and Inception-V3 models... are utilized for the experiments on the Image Net dataset (Russakovsky et al., 2015). The Image Net dataset... can be found at http://www.image-net.org/challenges/LSVRC/ 2012/nonpub-downloads. |
| Dataset Splits | Yes | We use all the data in the provided training set to train our model, and use the provided validation set to evaluate our model and report the testing accuracy. |
| Hardware Specification | Yes | All the training processes are done on a single TITAN XP GPU. ...Two TITAN RTX GPUs are used in parallel for the BSQ training and finetuning of both Res Net-50 and Inception-V3 models. |
| Software Dependencies | No | The paper mentions using 'torchvision python package' and following the 'official Py Torch Image Net example' but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The learning rate is set to 0.1 initially, and decayed by 0.1 at epoch 150, 250 and 325. ...The BSQ training is done for 350 epochs, with the first 250 epochs using learning rate 0.1 and the rest using learning rate 0.01. ...The finetuning is performed for 300 epochs with an initial learning rate 0.01 and the learning rate decay by 0.1 at epoch 150 and 250. ...All the training tasks are optimized with the SGD optimizer... with momentum 0.9 and weight decay 0.0001, and the batch size is set to 128. |