Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise

Authors: Pedro Savarese, Xin Yuan, Yanjing Li, Michael Maire

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that it finds highly heterogeneous precision assignments for CNNs trained on CIFAR and Image Net, improving upon previous state-of-the-art quantization methods. Our improvements extend to the challenging scenario of learning reduced-precision GANs.
Researcher Affiliation Academia Pedro Savarese TTI-Chicago savarese@ttic.edu Xin Yuan University of Chicago yuanx@uchicago.edu Yanjing Li University of Chicago yanjingl@uchicago.edu Michael Maire University of Chicago mmaire@uchicago.edu
Pseudocode Yes Algorithm 1 SMOL
Open Source Code No [No] We will release full source code upon paper acceptance.
Open Datasets Yes We first compare SMOL against different quantization methods on the small-scale CIFAR-10 dataset
Dataset Splits Yes We first compare SMOL against different quantization methods on the small-scale CIFAR-10 dataset
Hardware Specification No Our experiments involve training standard deep neural network models on modern GPUs; we include details on training epochs used in all experiments. a batch size of 256 which is distributed across 4 GPUs. This does not provide specific models or types of GPUs.
Software Dependencies No For all experiments we train the auxiliary parameters s with Adam [22], using the default learning rate of 10 3 and no weight decay all its other hyperparameters are set to their default values. No specific version numbers for software or libraries.
Experiment Setup Yes We adopt the standard data augmentation procedure of applying random translations and horizontal flips to training images, and train each network for a total of 650 epochs: the precisions are trained with SMOL for the first 350 while the remaining 300 are used to fine-tune the weights while the precisions remain fixed. ... To train the weights we use SGD with a momentum of 0.9 and an initial learning rate of 0.1, which is decayed at epochs 250, 500, and 600. We use a batch size of 128 and a weight decay of 10 4 for Res Net-20, 4 10 5 for Mobile Net V2, and 5 10 4 for Shuffle Net.