Relaxed Quantization for Discretized Neural Networks
Authors: Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classification. |
| Researcher Affiliation | Collaboration | Christos Louizos University of Amsterdam TNO Intelligent Imaging c.louizos@uva.nl Matthias Reisser QUVA Lab University of Amsterdam m.reisser@uva.nl Tijmen Blankevoort Qualcomm AI Research tijmen@qti.qualcomm.com Efstratios Gavves QUVA Lab University of Amsterdam egavves@uva.nl Max Welling University of Amsterdam Qualcomm m.welling@uva.nl |
| Pseudocode | Yes | Algorithm 1 Quantization during training. ... Algorithm 2 Quantization during testing. |
| Open Source Code | No | The paper mentions that experiments were implemented with TensorFlow and Keras, and refers to a TensorFlow GitHub repository for a pre-trained MobileNet model and for Jacob et al. (2017)'s code, but does not provide a link to the authors' own implementation of RQ. |
| Open Datasets | Yes | We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classification. |
| Dataset Splits | Yes | The final models were determined through early stopping using the validation loss computed with minibatch statistics, in case the model uses batch normalization. |
| Hardware Specification | Yes | In terms of wall-clock time, training the RQ model with a full (4 elements) grid took approximately 15 times as long as the high-precision baseline with an implementation in Tensorflow v1.11.0 and running on a single Titan-X Nvidia GPU. |
| Software Dependencies | Yes | All experiments were implemented with Tensor Flow (Abadi et al., 2015), using the Keras library (Chollet et al., 2015). ... running on a single Titan-X Nvidia GPU. |
| Experiment Setup | Yes | For the MNIST experiment we rescaled the input to the [-1, 1] range, employed no regularization and the network was trained with Adam (Kingma & Ba, 2014) and a batch size of 128. We used a local grid whenever the bit width was larger than 2 for both, weights and biases (shared grid parameters), as well as for the ouputs of the Re LU, with δ = 3. For the 8 and 4 bit networks we used a temperature λ of 2 whereas for the 2 bit models we used a temperature of 1 for RQ. We trained the 8 and 4 bit networks for 100 epochs using a learning rate of 1e-3 and the 2 bit networks for 200 epochs with a learning rate of 5e-4. In all of the cases the learning rate was annealed to zero during the last 50 epochs. |