And the Bit Goes Down: Revisiting the Quantization of Neural Networks
Authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach by quantizing a high performing Res Net-50 model to a memory size of 5 MB (20 compression factor) while preserving a top-1 accuracy of 76.1% on Image Net object classification and by compressing a Mask R-CNN with a 26 factor.1 |
| Researcher Affiliation | Collaboration | 1Facebook AI Research, 2Univ Rennes, Inria, CNRS, IRISA |
| Pseudocode | No | The paper describes the steps of its algorithm (E-step, M-step) in paragraph form and as bullet points but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Code and compressed models: https://github.com/facebookresearch/kill-the-bits. |
| Open Datasets | Yes | We quantize vanilla Res Net-18 and Res Net-50 architectures pretrained on the Image Net dataset (Deng et al., 2009). Unless explicit mention of the contrary, the pretrained models are taken from the Py Torch model zoo3. ... In particular, Yalniz et al. (Yalniz et al., 2019) use the publicly available YFCC-100M dataset (Thomee et al., 2015) to train a Res Net-50 that reaches 79.1% top-1 accuracy on the standard validation set of Image Net. |
| Dataset Splits | Yes | We quantize vanilla Res Net-18 and Res Net-50 architectures pretrained on the Image Net dataset (Deng et al., 2009). ... The accuracy is the top-1 error on the standard validation set of Image Net. ... We perform the global finetuning using the standard Image Net training set for 9 epochs with an initial learning rate of 0.01, a weight decay of 10 4 and a momentum of 0.9. The learning rate is decayed by a factor 10 every 3 epochs. |
| Hardware Specification | Yes | We run our method on a 16 GB Volta V100 GPU. Quantizing a Res Net50 with our method (including all finetuning steps) takes about one day on 1 GPU. ... We perform the fine-tuning (layer-wise and global) using distributed training on 8 V100 GPUs. |
| Software Dependencies | No | Unless explicit mention of the contrary, the pretrained models are taken from the Py Torch model zoo3. The paper mentions PyTorch but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | We quantize each layer while performing 100 steps of our method (sufficient for convergence in practice). We finetune the centroids of each layer on the standard Image Net training set during 2,500 iterations with a batch size of 128 (resp 64) for the Res Net-18 (resp.Res Net50) with a learning rate of 0.01, a weight decay of 10 4 and a momentum of 0.9. For accuracy and memory reasons, the classifier is always quantized with a block size d = 4 and k = 2048 (resp. k = 1024) centroids for the Res Net-18 (resp., Res Net-50). |