Training Quantized Nets: A Deeper Understanding

Authors: Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of these algorithms for non-convex problems, and show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic. [...] 6 Experiments To explore the implications of the theory above, we train both VGG-like networks [24] and Residual networks [25] with binarized weights on image classification problems. On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26].
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park 2School of Electrical and Computer Engineering, Cornell University {haoli,sohamde,xuzh,hjs,tomg}@cs.umd.edu, studer@cornell.edu
Pseudocode No The paper describes algorithms using mathematical equations (e.g., Eq 2, 4, 5, 6, 7) but does not provide structured pseudocode blocks or a section explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code No The paper does not provide any concrete access information for source code, such as a repository link or an explicit statement of code release.
Open Datasets Yes On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26].
Dataset Splits Yes The image pre-processing and data augmentation procedures are the same as [25].
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for the experiments are mentioned in the paper.
Software Dependencies No We use Adam [27] as our baseline optimizer... No version numbers for Adam or any other software dependencies are provided.
Experiment Setup Yes We set the initial learning rate to 0.01 and decrease the learning rate by a factor of 10 at epochs 82 and 122 for CIFAR-10 and CIFAR-100 [25]. For Image Net experiments, we train the model for 90 epochs and decrease the learning rate at epochs 30 and 60. [...] To verify this, we tried different batch sizes for SR including 128, 256, 512 and 1024, and found that the larger the batch size, the better the performance of SR.