reproducibilityindex.ai

Relaxed Quantization for Discretized Neural Networks

Authors: Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classiﬁcation.
Researcher Affiliation	Collaboration	Christos Louizos University of Amsterdam TNO Intelligent Imaging c.louizos@uva.nl Matthias Reisser QUVA Lab University of Amsterdam m.reisser@uva.nl Tijmen Blankevoort Qualcomm AI Research tijmen@qti.qualcomm.com Efstratios Gavves QUVA Lab University of Amsterdam egavves@uva.nl Max Welling University of Amsterdam Qualcomm m.welling@uva.nl
Pseudocode	Yes	Algorithm 1 Quantization during training. ... Algorithm 2 Quantization during testing.
Open Source Code	No	The paper mentions that experiments were implemented with TensorFlow and Keras, and refers to a TensorFlow GitHub repository for a pre-trained MobileNet model and for Jacob et al. (2017)'s code, but does not provide a link to the authors' own implementation of RQ.
Open Datasets	Yes	We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classiﬁcation.
Dataset Splits	Yes	The final models were determined through early stopping using the validation loss computed with minibatch statistics, in case the model uses batch normalization.
Hardware Specification	Yes	In terms of wall-clock time, training the RQ model with a full (4 elements) grid took approximately 15 times as long as the high-precision baseline with an implementation in Tensorﬂow v1.11.0 and running on a single Titan-X Nvidia GPU.
Software Dependencies	Yes	All experiments were implemented with Tensor Flow (Abadi et al., 2015), using the Keras library (Chollet et al., 2015). ... running on a single Titan-X Nvidia GPU.
Experiment Setup	Yes	For the MNIST experiment we rescaled the input to the [-1, 1] range, employed no regularization and the network was trained with Adam (Kingma & Ba, 2014) and a batch size of 128. We used a local grid whenever the bit width was larger than 2 for both, weights and biases (shared grid parameters), as well as for the ouputs of the Re LU, with δ = 3. For the 8 and 4 bit networks we used a temperature λ of 2 whereas for the 2 bit models we used a temperature of 1 for RQ. We trained the 8 and 4 bit networks for 100 epochs using a learning rate of 1e-3 and the 2 bit networks for 200 epochs with a learning rate of 5e-4. In all of the cases the learning rate was annealed to zero during the last 50 epochs.