AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

Authors: Li Ding, Wen Fei, Yuyang Huang, Shuangrui Ding, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on different backbones and datasets show that, compared to INT8 quantization, the proposed method can achieve more than 38% Bit OPs reduction with a tolerable loss below 2% in image classification, image segmentation, and language modeling.
Researcher Affiliation Academia 1School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China. 2Department of Information Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong.
Pseudocode Yes Algorithm 1 AMPA training framework. Input: The initialized model, the number of epochs N, the update interval Itv, the fair allocation threshold Thr. Output: The trained model with mixed-precision weights and activations.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes We experiment with Res Net-18/20/32/56 (He et al., 2016), Mobile Net V2 (Sandler et al., 2018), Inception V3 (Szegedy et al., 2016), and Vi T (Dosovitskiy et al., 2021) on CIFAR-10/100, and Res Net-18/34/50 (He et al., 2016), Mobile Net V2 (Sandler et al., 2018) and Vi T-S (Dosovitskiy et al., 2021) on Image Net. We train the popular segmentation network U-Net (Ronneberger et al., 2015) on the Dsb2018 dataset and Kvasir dataset (Jha et al., 2020)... For the language modeling task, we choose the widely known Transformer network (Vaswani et al., 2017) and train it on the Wikitext-2 (Merity et al., 2017), Wikitext-103 (Merity et al., 2017) and Penn Treebank datasets (Marcus et al., 1993).
Dataset Splits Yes The proposed adaptive mixed precision allocation (AMPA) training framework employs consistent hyperparameters: the fair allocation threshold is Thr = 3, the update frequency(Itv/N) is set to 0.05 and the layer update ratio α, β, and γ are 10,20 and 30 for weights, activations and gradients respectively. The training process consists of 200 epochs.
Hardware Specification Yes We simulate the training process on the FPGA device with the proposed AMPA training framework. Table 14 illustrates the results of applying our method and full INT8 training (Zhu et al., 2020) to train Res Net-20 on CIFAR10. The batch size is 64 and the FPGA chip is selected as xc7vx485tffg1157.
Software Dependencies No The paper mentions various models and datasets, but does not specify any software dependencies (e.g., Python, PyTorch, TensorFlow versions) with version numbers.
Experiment Setup Yes The proposed adaptive mixed precision allocation (AMPA) training framework employs consistent hyperparameters: the fair allocation threshold is Thr = 3, the update frequency(Itv/N) is set to 0.05 and the layer update ratio α, β, and γ are 10,20 and 30 for weights, activations and gradients respectively. The bitwidths are selected among INT4, INT6, and INT8. We employ the symmetric uniform quantization for weights and gradients while adopting asymmetric uniform quantization for activations. The training process consists of 200 epochs.