FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Authors: Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first empirically confirm the importance of learning a quantization grid size s1 jointly with the rounding process and the distinct contribution of additional tensors s3 and s4 to Flex Round. Then, we compare the performance of Flex Round with that of the state-of-the-art PTQ methods in a per-tensor uniform PTQ setting in the following cases: image classification on Image Net (Russakovsky et al., 2015) with Res Net (He et al., 2016) and Mobile Net V2 (Sandler et al., 2018) (Section 4.2), natural language understanding (NLU) on GLUE (Wang et al., 2018) with BERT (Devlin et al., 2018) and GPT-Neo (Black et al., 2021) (Section 4.3), natural language generation (NLG) on Wiki Text2 (Merity et al., 2016) and Penn Treebank (PTB) (Marcus et al., 1993) with GPT-Neo and OPT (Zhang et al., 2022), and NLG on Web NLG (Gardent et al., 2017) with GPT-2 (Radford et al., 2019) (Section 4.3). |
| Researcher Affiliation | Industry | 1NAVER Cloud, Seongnam, South Korea. Correspondence to: Jung Hyun Lee <onliwad101@gmail.com>, Jeonghoon Kim <jeonghoon.samuel@gmail.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks clearly labeled as such. |
| Open Source Code | No | The paper states 'Our experiments are performed based on full-precision pre-trained models provided in the BRECQ github repository1, unless otherwise noted.' and 'All experimental results are conducted by our own implementation based on open-source codes.' but does not explicitly state that the code for Flex Round itself is open-source or provide a link to their implementation. |
| Open Datasets | Yes | image classification on Image Net (Russakovsky et al., 2015) with Res Net (He et al., 2016) and Mobile Net V2 (Sandler et al., 2018) (Section 4.2), natural language understanding (NLU) on GLUE (Wang et al., 2018) with BERT (Devlin et al., 2018) and GPT-Neo (Black et al., 2021) (Section 4.3), natural language generation (NLG) on Wiki Text2 (Merity et al., 2016) and Penn Treebank (PTB) (Marcus et al., 1993) with GPT-Neo and OPT (Zhang et al., 2022), and NLG on Web NLG (Gardent et al., 2017) with GPT-2 (Radford et al., 2019) (Section 4.3). |
| Dataset Splits | No | The paper mentions selecting samples for reconstruction (e.g., '1024 randomly sampled images' or '1024 random samples from the training dataset') but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or defined subsets for model training/evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions general software components like 'CUDA kernel' and 'Adam optimizer' but does not specify exact version numbers for programming languages, libraries, or frameworks used (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | For Flex Round, the output of each layer or block is reconstructed during 5k iterations while all learnable parameters (i.e., s1, S2, s3, and s4) are updated by using one learning rate (e.g., 4e-4 for the Res Net models quantized by 3-bit or 4-bit, or 1e-3 for the Res Net models quantized by 2-bit and Mobile Net V2). The learning rate applied to all learnable parameters (s1, S2, and s3) is selected to be 2e-4 for BERT and to be 3e-4 for GPT-Neo regardless of the task to demonstrate that Q + Flex Round can broadly surpass Q + Ada Round without the need of significant efforts to select the optimal learning rate for each task. Reconstruction process is performed by using 1024 random samples for 20K iterations. |