Variational Network Quantization
Authors: Jan Achterhold, Jan Mathias Koehler, Anke Schmeink, Tim Genewein
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results are shown for ternary quantization on Le Net-5 (MNIST) and Dense Net (CIFAR-10). In our experiments, we train with VNQ and then first prune via thresholding log αij log Tα = 2. We demonstrate our method with Le Net-54 (Le Cun et al., 1998) on the MNIST handwritten digits dataset. Our second experiment uses a modern Dense Net (Huang et al., 2017) (k = 12, depth L = 76, with bottlenecks) on CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Researcher Affiliation | Collaboration | Jan Achterhold1,2, Jan M. K ohler1, Anke Schmeink2 & Tim Genewein1,* 1Bosch Center for Artificial Intelligence Robert Bosch Gmb H Renningen, Germany 2RWTH Aachen University Institute for Theoretical Information Technology Aachen, Germany |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or include a link to a code repository for the described methodology. |
| Open Datasets | Yes | We demonstrate our method with Le Net-54 (Le Cun et al., 1998) on the MNIST handwritten digits dataset. Our second experiment uses a modern Dense Net (Huang et al., 2017) (k = 12, depth L = 76, with bottlenecks) on CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | The paper mentions 'validation accuracy' and 'validation error' in its results and during the training process, for example, 'validation accuracy of 99.2%'. However, it does not explicitly provide the specific dataset split percentages or sample counts used for the validation set, nor does it cite a predefined validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions software like 'Caffe' and 'Adam optimizer' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We initialize means θ with the pre-trained weights and variances with log σ2 = 8. The warm-up factor β is linearly increased from 0 to 1 during the first 15 epochs. VNQ training runs for a total of 195 epochs with a batch-size of 128, the learning rate is linearly decreased from 0.001 to 0 and the learning rate for adjusting the codebook parameter a uses a learning rate that is 100 times lower. For DenseNet: we use a batch-size of 64 samples, the warm-up weight β of the KL term is 0 for the first 5 epochs and is then linearly ramped up from 0 to 1 over the next 15 epochs, the learning rate of 0.005 is kept constant for the first 50 epochs and then linearly decreased to a value of 0.003 when training stops after 150 epochs. |