Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise
Authors: Pedro Savarese, Xin Yuan, Yanjing Li, Michael Maire
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that it finds highly heterogeneous precision assignments for CNNs trained on CIFAR and Image Net, improving upon previous state-of-the-art quantization methods. Our improvements extend to the challenging scenario of learning reduced-precision GANs. |
| Researcher Affiliation | Academia | Pedro Savarese TTI-Chicago savarese@ttic.edu Xin Yuan University of Chicago yuanx@uchicago.edu Yanjing Li University of Chicago yanjingl@uchicago.edu Michael Maire University of Chicago mmaire@uchicago.edu |
| Pseudocode | Yes | Algorithm 1 SMOL |
| Open Source Code | No | [No] We will release full source code upon paper acceptance. |
| Open Datasets | Yes | We first compare SMOL against different quantization methods on the small-scale CIFAR-10 dataset |
| Dataset Splits | Yes | We first compare SMOL against different quantization methods on the small-scale CIFAR-10 dataset |
| Hardware Specification | No | Our experiments involve training standard deep neural network models on modern GPUs; we include details on training epochs used in all experiments. a batch size of 256 which is distributed across 4 GPUs. This does not provide specific models or types of GPUs. |
| Software Dependencies | No | For all experiments we train the auxiliary parameters s with Adam [22], using the default learning rate of 10 3 and no weight decay all its other hyperparameters are set to their default values. No specific version numbers for software or libraries. |
| Experiment Setup | Yes | We adopt the standard data augmentation procedure of applying random translations and horizontal flips to training images, and train each network for a total of 650 epochs: the precisions are trained with SMOL for the first 350 while the remaining 300 are used to fine-tune the weights while the precisions remain fixed. ... To train the weights we use SGD with a momentum of 0.9 and an initial learning rate of 0.1, which is decayed at epochs 250, 500, and 600. We use a batch size of 128 and a weight decay of 10 4 for Res Net-20, 4 10 5 for Mobile Net V2, and 5 10 4 for Shuffle Net. |