Toward Efficient Low-Precision Training: Data Format Optimization and Hysteresis Quantization

Authors: Sunwoo Lee, Jeongwoo Park, Dongsuk Jeon

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We measure the training performance of this format and compare it against other 8-bit data formats from recent studies by applying those formats to the training of various neural network models.
Researcher Affiliation Academia Sunwoo Lee, Jeongwoo Park, Dongsuk Jeon Graduate School of Convergence Science and Technology Seoul National University, Seoul, Korea {ori915,jeffjw,djeon1}@snu.ac.kr
Pseudocode No The paper describes the methods textually and with mathematical equations, but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes We created a package named lptorch, short for low precision Py Torch, and the code can be found in the supplementary material.
Open Datasets Yes Training loss is obtained by training Res Net-18 on CIFAR-10 dataset using SGD with a momentum of 0.9 for 60 epochs.
Dataset Splits Yes Fig. 1(b) shows the Top-1 validation accuracy of Res Net-18 (He et al., 2016) trained on Image Net
Hardware Specification No The paper discusses hardware implementation costs for MAC units (e.g., 'Synthesized in 40nm Process', 'FPGA (XC7A100TCSG324-1)') related to the proposed formats, but it does not specify the hardware (e.g., GPU, CPU models) used to run the neural network training experiments.
Software Dependencies No The paper mentions software like 'Py Torch', 'C++ and CUDA codes', 'Python APIs', 'Fair Seq', and 'SGD' but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We conducted Image Net experiments using SGD with a momentum of 0.9 for 90 epochs with a batch size of 256 images and an initial learning rate of 0.1 which is decayed by a factor of 10 at the 30th and 60th epochs.