SDQ: Stochastic Differentiable Quantization with Mixed Precision
Authors: Xijie Huang, Zhiqiang Shen, Shichao Li, Zechun Liu, Hu Xianghong, Jeffry Wicaksana, Eric Xing, Kwang-Ting Cheng
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our method for several networks on different hardware (GPUs and FPGA) and datasets. SDQ outperforms all state-of-the-art mixed or single precision quantization with a lower bitwidth and is even better than the full-precision counterparts across various Res Net and Mobile Net families, demonstrating its effectiveness and superiority.1 |
| Researcher Affiliation | Collaboration | Xijie Huang 1 Zhiqiang Shen 1 2 3 Shichao Li 1 Zechun Liu 4 Xianghong Hu 1 5 Jeffry Wicaksana 1 Eric Xing 6 2 Kwang-Ting Cheng 1 1Hong Kong University of Science and Technology 2Mohamed bin Zayed University of Artificial Intelligence 3Jockey Club Institute for Advanced Study, HKUST 4Reality Labs, Meta Inc 5ACCESS AI Chip Center for Emerging Smart Systems 6Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Stochastic Differentiable Quantization |
| Open Source Code | No | 1Project: https://huangowen.github.io/SDQ/. This link points to a project overview page, not a direct source code repository. |
| Open Datasets | Yes | The experiments are carried out on CIFAR10 dataset (Krizhevsky et al., 2009) and Image Net-1K dataset (Deng et al., 2009). |
| Dataset Splits | Yes | We only perform basic data augmentation in Py Torch (Paszke et al., 2019), which includes Random Resized Crop and Random Horizontal Flip during training, and single-crop operation during evaluation. The training set contains 115k images (trainval35k), and the validation set has 5k images (minival). |
| Hardware Specification | Yes | We extensively evaluate our method for several networks on different hardware (GPUs and FPGA)... We further conduct deployment experiments on various hardware (GPUs and a real FPGA system)... 4.5. Hardware Efficiency on Accelerator: We evaluate our model on the accelerator that supports mixed precision arithmetic operation: Bit Fusion (Sharma et al., 2018). 4.6.1. FPGA SYSTEM SETTING: The system is implemented on the Xilinx U50 FPGA platform and consumes 259688 LUTs and 210.5 BRAM...MAC array-4 rows 16 columns, Frequency-200MHz, Number of Cores-8. |
| Software Dependencies | No | We only perform basic data augmentation in Py Torch (Paszke et al., 2019), which includes Random Resized Crop and Random Horizontal Flip during training, and single-crop operation during evaluation. The paper mentions PyTorch and cites the paper, but does not specify a version number for PyTorch or any other software library. |
| Experiment Setup | Yes | Details of all hyperparameters and training schemes are shown in Appendix C. Appendix C.1 provides detailed hyperparameters like Epoch, Batch Size, Optimizer, Initial lr, lr scheduler, Weight decay, Warmup epochs, etc., for different network architectures and phases. |