ProxQuant: Quantized Neural Networks via Proximal Operators
Authors: Yu Bai, Yu-Xiang Wang, Edo Liberty
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness and flexibility of PROXQUANT through systematic experiments on (1) image classification with Res Nets (Section 4.1); (2) language modeling with LSTMs (Section 4.2). The PROXQUANT method outperforms the state-of-the-art results on binary quantization and is comparable with the state-of-the-art on multi-bit quantization. We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32. |
| Researcher Affiliation | Collaboration | Yu Bai Stanford University yub@stanford.edu Yu-Xiang Wang UC Santa-Barbara yuxiangw@cs.ucsb.edu Edo Liberty Amazon AI libertye@amazon.com |
| Pseudocode | Yes | Algorithm 1 PROXQUANT: Prox-gradient method for quantized net training |
| Open Source Code | Yes | Code available at https://github.com/allenbai01/ProxQuant. |
| Open Datasets | Yes | We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32. We perform language modeling with LSTMs Hochreiter & Schmidhuber (1997) on the Penn Treebank (PTB) dataset (Marcus et al., 1993), which contains 929K training tokens, 73K validation tokens, and 82K test tokens. |
| Dataset Splits | Yes | We perform image classification on the CIFAR-10 dataset, which contains 50000 training images and 10000 test images of size 32x32. We perform language modeling with LSTMs Hochreiter & Schmidhuber (1997) on the Penn Treebank (PTB) dataset (Marcus et al., 1993), which contains 929K training tokens, 73K validation tokens, and 82K test tokens. |
| Hardware Specification | No | The paper does not explicitly state the hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use the homotopy method λt = λ t with λ = 10 4 as the regularization strength and Adam with constant learning rate 0.01 as the optimizer. For Binary Connect, we train with the recommended Adam optimizer with learning rate decay (Courbariaux et al., 2015) (initial learning rate 0.01, multiply by 0.1 at epoch 81 and 122). |