And the Bit Goes Down: Revisiting the Quantization of Neural Networks

Authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach by quantizing a high performing Res Net-50 model to a memory size of 5 MB (20 compression factor) while preserving a top-1 accuracy of 76.1% on Image Net object classification and by compressing a Mask R-CNN with a 26 factor.1
Researcher Affiliation Collaboration 1Facebook AI Research, 2Univ Rennes, Inria, CNRS, IRISA
Pseudocode No The paper describes the steps of its algorithm (E-step, M-step) in paragraph form and as bullet points but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Code and compressed models: https://github.com/facebookresearch/kill-the-bits.
Open Datasets Yes We quantize vanilla Res Net-18 and Res Net-50 architectures pretrained on the Image Net dataset (Deng et al., 2009). Unless explicit mention of the contrary, the pretrained models are taken from the Py Torch model zoo3. ... In particular, Yalniz et al. (Yalniz et al., 2019) use the publicly available YFCC-100M dataset (Thomee et al., 2015) to train a Res Net-50 that reaches 79.1% top-1 accuracy on the standard validation set of Image Net.
Dataset Splits Yes We quantize vanilla Res Net-18 and Res Net-50 architectures pretrained on the Image Net dataset (Deng et al., 2009). ... The accuracy is the top-1 error on the standard validation set of Image Net. ... We perform the global finetuning using the standard Image Net training set for 9 epochs with an initial learning rate of 0.01, a weight decay of 10 4 and a momentum of 0.9. The learning rate is decayed by a factor 10 every 3 epochs.
Hardware Specification Yes We run our method on a 16 GB Volta V100 GPU. Quantizing a Res Net50 with our method (including all finetuning steps) takes about one day on 1 GPU. ... We perform the fine-tuning (layer-wise and global) using distributed training on 8 V100 GPUs.
Software Dependencies No Unless explicit mention of the contrary, the pretrained models are taken from the Py Torch model zoo3. The paper mentions PyTorch but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes We quantize each layer while performing 100 steps of our method (sufficient for convergence in practice). We finetune the centroids of each layer on the standard Image Net training set during 2,500 iterations with a batch size of 128 (resp 64) for the Res Net-18 (resp.Res Net50) with a learning rate of 0.01, a weight decay of 10 4 and a momentum of 0.9. For accuracy and memory reasons, the classifier is always quantized with a block size d = 4 and k = 2048 (resp. k = 1024) centroids for the Res Net-18 (resp., Res Net-50).