ProxQuant: Quantized Neural Networks via Proximal Operators

Authors: Yu Bai, Yu-Xiang Wang, Edo Liberty

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness and flexibility of PROXQUANT through systematic experiments on (1) image classification with Res Nets (Section 4.1); (2) language modeling with LSTMs (Section 4.2). The PROXQUANT method outperforms the state-of-the-art results on binary quantization and is comparable with the state-of-the-art on multi-bit quantization. We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32.
Researcher Affiliation Collaboration Yu Bai Stanford University yub@stanford.edu Yu-Xiang Wang UC Santa-Barbara yuxiangw@cs.ucsb.edu Edo Liberty Amazon AI libertye@amazon.com
Pseudocode Yes Algorithm 1 PROXQUANT: Prox-gradient method for quantized net training
Open Source Code Yes Code available at https://github.com/allenbai01/ProxQuant.
Open Datasets Yes We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32. We perform language modeling with LSTMs Hochreiter & Schmidhuber (1997) on the Penn Treebank (PTB) dataset (Marcus et al., 1993), which contains 929K training tokens, 73K validation tokens, and 82K test tokens.
Dataset Splits Yes We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32. We perform language modeling with LSTMs Hochreiter & Schmidhuber (1997) on the Penn Treebank (PTB) dataset (Marcus et al., 1993), which contains 929K training tokens, 73K validation tokens, and 82K test tokens.
Hardware Specification No The paper does not explicitly state the hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions software like Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We use the homotopy method λt = λ t with λ = 10 4 as the regularization strength and Adam with constant learning rate 0.01 as the optimizer. For Binary Connect, we train with the recommended Adam optimizer with learning rate decay (Courbariaux et al., 2015) (initial learning rate 0.01, multiply by 0.1 at epoch 81 and 122).