Focused Quantization for Sparse CNNs

Authors: Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Res Net-50, we achieved a 18.08 CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a Res Net-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput.
Researcher Affiliation Collaboration 1 University of Cambridge 2 Shenzhen Institutes of Advanced Technology 3 University of Macau
Pseudocode No The paper includes diagrams (e.g., Figure 2, Figure 4) illustrating the process or architecture, but it does not contain formal pseudocode or algorithm blocks.
Open Source Code Yes Finally, FQ and the optimized models are fully open-source and released to the public3. 3Available at: https://github.com/deep-fry/mayo.
Open Datasets Yes We applied focused compression (FC)... on the Image Net dataset [2]... This model is a fast CIFAR-10 [13] classifier...
Dataset Splits Yes We applied focused compression (FC)... on the Image Net dataset [2]... This model is a fast CIFAR-10 [13] classifier... At each step, the models were fine-tuned for 3 epochs at a learning rate of 0.001, except for the final step at 100% we ran for 10 epochs, and decay the learning rate every 3 epochs.
Hardware Specification No The paper discusses 'custom hardware accelerators' and provides 'Computation resource estimates of custom accelerators' in terms of logic gates (Table 4), but it does not specify the commercial hardware (e.g., specific GPUs, CPUs) used to run the reported experiments.
Software Dependencies No The paper mentions techniques and tools used (e.g., 'Dynamic Network Surgery', 'Incremental Network Quantization', 'Huffman encoding') but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes During fine-tuning, we additionally employed Incremental Network Quantization (INQ) [26] and gradually increased the proportion of weights being quantized to 25%, 50%, 75%, 87.5% and 100%. At each step, the models were fine-tuned for 3 epochs at a learning rate of 0.001, except for the final step at 100% we ran for 10 epochs, and decay the learning rate every 3 epochs.