Training Quantized Nets: A Deeper Understanding
Authors: Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of these algorithms for non-convex problems, and show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic. [...] 6 Experiments To explore the implications of the theory above, we train both VGG-like networks [24] and Residual networks [25] with binarized weights on image classification problems. On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26]. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park 2School of Electrical and Computer Engineering, Cornell University {haoli,sohamde,xuzh,hjs,tomg}@cs.umd.edu, studer@cornell.edu |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., Eq 2, 4, 5, 6, 7) but does not provide structured pseudocode blocks or a section explicitly labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide any concrete access information for source code, such as a repository link or an explicit statement of code release. |
| Open Datasets | Yes | On CIFAR-10, we train Res Net-56, wide Res Net-56 (WRN-56-2, with 2X more filters than Res Net-56), VGG-9, and the high capacity VGG-BC network used for the original BC model [5]. We also train Res Net-56 on CIFAR-100, and Res Net-18 on Image Net [26]. |
| Dataset Splits | Yes | The image pre-processing and data augmentation procedures are the same as [25]. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for the experiments are mentioned in the paper. |
| Software Dependencies | No | We use Adam [27] as our baseline optimizer... No version numbers for Adam or any other software dependencies are provided. |
| Experiment Setup | Yes | We set the initial learning rate to 0.01 and decrease the learning rate by a factor of 10 at epochs 82 and 122 for CIFAR-10 and CIFAR-100 [25]. For Image Net experiments, we train the model for 90 epochs and decrease the learning rate at epochs 30 and 60. [...] To verify this, we tried different batch sizes for SR including 128, 256, 512 and 1024, and found that the larger the batch size, the better the performance of SR. |