BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Authors: Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, Shi Gu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various handcrafted and searched neural architectures are conducted for both image classification and object detection tasks.4 EXPERIMENTS In this section, we report experimental results for the Image Net classification task and MS COCO object detection task.
Researcher Affiliation Collaboration Yuhang Li12 , Ruihao Gong2 , Xu Tan2, Yang Yang2, Peng Hu2, Qi Zhang2, Fengwei Yu2, Wei Wang, Shi Gu1 1University of Electronic Science and Technology of China, 2Sense Time Research liyuhang699@gmail.com, gongruihao@sensetime.com, gus@uestc.edu.cn
Pseudocode Yes Algorithm 1: BRECQ optimization ... Algorithm 2: Genetic algorithm
Open Source Code Yes Codes are available at https://github.com/yhhhli/BRECQ.
Open Datasets Yes We conduct experiments on a variety of modern deep learning architectures... Image Net classification task and MS COCO object detection task. ... The Image Net dataset consists of 1.2M training images and 50K test images. We follows standard pre-process (He et al., 2016) to get 1024 224 224 input images as the calibration dataset. ... For object detection tasks, we use 256 training images taken from the MS COCO dataset for calibration.
Dataset Splits No The Image Net dataset consists of 1.2M training images and 50K test images. We follows standard pre-process (He et al., 2016) to get 1024 224 224 input images as the calibration dataset. ... For object detection tasks, we use 256 training images taken from the MS COCO dataset for calibration. While calibration data is mentioned, it is not described as a 'validation' split in the traditional sense for model selection/hyperparameter tuning, nor are explicit train/validation/test splits provided in percentages or counts beyond the total training/test and a small calibration set.
Hardware Specification Yes And we can obtain a quantized Res Net-18 within 20 minutes on a single GTX 1080TI GPU. ... For the acquisition of mobile ARM CPU latency... Raspberry Pi 3B, which has a 1.2 GHz 64-bit quad-core ARM Cortex-A53.
Software Dependencies No The paper mentions using 'Adam optimizer' but does not specify version numbers for any key software components, libraries, or programming languages (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes The batch size of learning is set to 32 and each block will be optimized for 2 × 104 iterations. The learning rate is set to 10−3 during the whole learning process. Other hyper-parameters such as the temperature β are kept the same with Nagel et al. (2020). For activation step size, we also use Adam optimizer and set the learning rate to 4e-5.