Mixed Precision DNNs: All you need is a good parametrization
Authors: Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We confirm our findings with experiments on CIFAR-10 and Image Net and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance. |
| Researcher Affiliation | Industry | Stefan Uhlich , Lukas Mauch , Fabien Cardinaux , Kazuki Yoshiyama Javier Alonso García, Stephen Tiedemann, Thomas Kemp Sony Europe B.V., Germany firstname.lastname@sony.com Akira Nakamura Sony Corporate, Japan akira.b.nakamura@sony.com |
| Pseudocode | Yes | The following code gives our differentiable quantizer implementation in NNabla (Sony). The source code for reproducing our results will be published after the review process has been finished. |
| Open Source Code | No | The source code for reproducing our results will be published after the review process has been finished. |
| Open Datasets | Yes | We confirm our findings with experiments on CIFAR-10 and Image Net. |
| Dataset Splits | Yes | Fig. 4 shows the evolution of the training and validation error during training for the case of uniform quantization. The plots for power-of-two quantization can be found in the appendix (Fig. 10). We initialize this network from random parameters or from a pre-trained float network. |
| Hardware Specification | Yes | Each epoch takes about 2.5 min on a single GTX 1080 Ti. |
| Software Dependencies | No | The following code gives our differentiable quantizer implementation in NNabla (Sony). |
| Experiment Setup | Yes | The quantized DNNs are trained for 160 epochs, using SGD with momentum 0.9 and a learning rate schedule starting with 0.01 and reducing it by a factor of 10 after 80 and 120 epochs, respectively. We use random flips and crops for data augmentation. |