Heterogeneous Bitwidth Binarization in Convolutional Neural Networks

Authors: Joshua Fromm, Shwetak Patel, Matthai Philipose

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we show that it is feasible and useful to select bitwidths at the parameter granularity during training. For instance a heterogeneously quantized version of modern networks such as Alex Net and Mobile Net, with the right mix of 1-, 2and 3-bit parameters that average to just 1.4 bits can equal the accuracy of homogeneous 2-bit versions of these networks. Further, we provide analyses to show that the heterogeneously binarized systems yield FPGAand ASIC-based implementations that are correspondingly more efficient in both circuit area and energy efficiency than their homogeneous counterparts. We present a rigorous empirical evaluation (including on highly optimized modern networks such as Google s Mobile Net) to show that heterogeneity yields equivalent accuracy at significantly lower average bitwidth.
Researcher Affiliation Collaboration Josh Fromm Department of Electrical Engineering University of Washington Seattle, WA 98195 jwfromm@uw.edu Shwetak Patel Department of Computer Science University of Washington Seattle, WA 98195 shwetak@cs.washington.edu Matthai Philipose Microsoft Research Redmond, WA 98052 matthaip@microsoft.com
Pseudocode Yes Algorithm 1 Generation of bit map M. Input: A tensor T of size N and an average bitwidth B. Output: A bit map M that can be used in Equation 5 to heterogeneously binarize T. 1: R = T Initialize R, which contains values that have not yet been assigned a bitwidth 2: x = 0 3: P = Dist From Avg(B) Generate distribution of bits to fit average. 4: for (b, pb) in P do b is a bitwidth and pb is the percentage of T to binarize to width b. 5: S = Sort Heuristic(R) Sort indices of remaining values by suitability for b-bit binarization. 6: M[S[x : x + pb N]] = b 7: R = R \ R[S[x : x + pb N]] Do not consider these indices in next step. 8: x += pb N 9: end for
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit code release statement) for its source code.
Open Datasets Yes Alex Net with batch-normalization (Alex Net-BN) is the standard model used in binarization work due to its longevity and the general acceptance that improvements made to accuracy transfer well to more modern architectures. For this experiment, we used the CIFAR-10 dataset with a deliberately hobbled (4-layer fully convolutional) model. On the Image Net dataset with Alex Net and Mobile Net models, we perform extensive experiments to validate the effectiveness of HBNNs compared to the state of the art and full precision accuracy.
Dataset Splits No The paper mentions using well-known datasets like CIFAR-10 and ImageNet but does not explicitly state the specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split citations) used for reproducibility.
Hardware Specification Yes There have been several recent binary convolutional neural network implementations on FGPAs and ASICs that provide a baseline we can use to estimate the performance of HBNNs on ZC706 FPGA platforms (Umuroglu et al., 2017) and on ASIC hardware (Alemdar et al., 2017).
Software Dependencies No The paper mentions 'Py Torch' but does not provide a specific version number for it or any other key software dependencies.
Experiment Setup Yes We train all models using an SGD solver with learning rate 0.01, momentum 0.9, and weight decay 1e-4 and randomly initialized weights for 90 epochs on Py Torch.