Bayesian Bits: Unifying Quantization and Pruning

Authors: Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.
Researcher Affiliation Industry Qualcomm AI Research {mart,clouizos,markusn,ramjad,yinwan,tijmen,mwelling}@qti.qualcomm.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We considered the toy tasks of MNIST and CIFAR 10 classification... Experiments on Imagenet
Dataset Splits No The paper mentions 'validation scores' but does not provide specific details on the dataset split for validation.
Hardware Specification Yes The resulting total runtime for one Res Net18 experiment, consisting of 30 epochs of training with Bayesian Bits and 10 epochs of fixed-gate fine-tuning, is approximately 70 hours on a single Nvidia Tesla V100.
Software Dependencies No The paper mentions using a 'pretrained Py Torch model' but does not specify version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We initialized the parameters of the gates to a large value so that the model initially uses its full 32-bit capacity without pruning. We fine-tuned the model s weights jointly with the quantization parameters for 30 epochs using Bayesian Bits. During the last epochs of Bayesian Bits training, BOP count remains stable but validation scores fluctuate due to the stochastic gates, so we fixed the gates and fine-tuned the weights and quantization ranges for another 10 epochs.