reproducibilityindex.ai

Training Quantized Nets: A Deeper Understanding

Authors: Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of these algorithms for non-convex problems, and show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic. [...] 6 Experiments To explore the implications of the theory above, we train both VGG-like networks [24] and Residual networks [25] with binarized weights on image classification problems. On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26].
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park 2School of Electrical and Computer Engineering, Cornell University {haoli,sohamde,xuzh,hjs,tomg}@cs.umd.edu, studer@cornell.edu
Pseudocode	No	The paper describes algorithms using mathematical equations (e.g., Eq 2, 4, 5, 6, 7) but does not provide structured pseudocode blocks or a section explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper does not provide any concrete access information for source code, such as a repository link or an explicit statement of code release.
Open Datasets	Yes	On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26].
Dataset Splits	Yes	The image pre-processing and data augmentation procedures are the same as [25].
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for the experiments are mentioned in the paper.
Software Dependencies	No	We use Adam [27] as our baseline optimizer... No version numbers for Adam or any other software dependencies are provided.
Experiment Setup	Yes	We set the initial learning rate to 0.01 and decrease the learning rate by a factor of 10 at epochs 82 and 122 for CIFAR-10 and CIFAR-100 [25]. For Image Net experiments, we train the model for 90 epochs and decrease the learning rate at epochs 30 and 60. [...] To verify this, we tried different batch sizes for SR including 128, 256, 512 and 1024, and found that the larger the batch size, the better the performance of SR.