Variational Network Quantization

Authors: Jan Achterhold, Jan Mathias Koehler, Anke Schmeink, Tim Genewein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results are shown for ternary quantization on Le Net-5 (MNIST) and Dense Net (CIFAR-10). In our experiments, we train with VNQ and then first prune via thresholding log αij log Tα = 2. We demonstrate our method with Le Net-54 (Le Cun et al., 1998) on the MNIST handwritten digits dataset. Our second experiment uses a modern Dense Net (Huang et al., 2017) (k = 12, depth L = 76, with bottlenecks) on CIFAR-10 (Krizhevsky & Hinton, 2009).
Researcher Affiliation Collaboration Jan Achterhold1,2, Jan M. K ohler1, Anke Schmeink2 & Tim Genewein1,* 1Bosch Center for Artificial Intelligence Robert Bosch Gmb H Renningen, Germany 2RWTH Aachen University Institute for Theoretical Information Technology Aachen, Germany
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about releasing source code or include a link to a code repository for the described methodology.
Open Datasets Yes We demonstrate our method with Le Net-54 (Le Cun et al., 1998) on the MNIST handwritten digits dataset. Our second experiment uses a modern Dense Net (Huang et al., 2017) (k = 12, depth L = 76, with bottlenecks) on CIFAR-10 (Krizhevsky & Hinton, 2009).
Dataset Splits No The paper mentions 'validation accuracy' and 'validation error' in its results and during the training process, for example, 'validation accuracy of 99.2%'. However, it does not explicitly provide the specific dataset split percentages or sample counts used for the validation set, nor does it cite a predefined validation split.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions software like 'Caffe' and 'Adam optimizer' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We initialize means θ with the pre-trained weights and variances with log σ2 = 8. The warm-up factor β is linearly increased from 0 to 1 during the first 15 epochs. VNQ training runs for a total of 195 epochs with a batch-size of 128, the learning rate is linearly decreased from 0.001 to 0 and the learning rate for adjusting the codebook parameter a uses a learning rate that is 100 times lower. For DenseNet: we use a batch-size of 64 samples, the warm-up weight β of the KL term is 0 for the first 5 epochs and is then linearly ramped up from 0 to 1 over the next 15 epochs, the learning rate of 0.005 is kept constant for the first 50 epochs and then linearly decreased to a value of 0.003 when training stops after 150 epochs.