Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
Authors: Joshua Fromm, Shwetak Patel, Matthai Philipose
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that it is feasible and useful to select bitwidths at the parameter granularity during training. For instance a heterogeneously quantized version of modern networks such as Alex Net and Mobile Net, with the right mix of 1-, 2and 3-bit parameters that average to just 1.4 bits can equal the accuracy of homogeneous 2-bit versions of these networks. Further, we provide analyses to show that the heterogeneously binarized systems yield FPGAand ASIC-based implementations that are correspondingly more efficient in both circuit area and energy efficiency than their homogeneous counterparts. We present a rigorous empirical evaluation (including on highly optimized modern networks such as Google s Mobile Net) to show that heterogeneity yields equivalent accuracy at significantly lower average bitwidth. |
| Researcher Affiliation | Collaboration | Josh Fromm Department of Electrical Engineering University of Washington Seattle, WA 98195 jwfromm@uw.edu Shwetak Patel Department of Computer Science University of Washington Seattle, WA 98195 shwetak@cs.washington.edu Matthai Philipose Microsoft Research Redmond, WA 98052 matthaip@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Generation of bit map M. Input: A tensor T of size N and an average bitwidth B. Output: A bit map M that can be used in Equation 5 to heterogeneously binarize T. 1: R = T Initialize R, which contains values that have not yet been assigned a bitwidth 2: x = 0 3: P = Dist From Avg(B) Generate distribution of bits to fit average. 4: for (b, pb) in P do b is a bitwidth and pb is the percentage of T to binarize to width b. 5: S = Sort Heuristic(R) Sort indices of remaining values by suitability for b-bit binarization. 6: M[S[x : x + pb N]] = b 7: R = R \ R[S[x : x + pb N]] Do not consider these indices in next step. 8: x += pb N 9: end for |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit code release statement) for its source code. |
| Open Datasets | Yes | Alex Net with batch-normalization (Alex Net-BN) is the standard model used in binarization work due to its longevity and the general acceptance that improvements made to accuracy transfer well to more modern architectures. For this experiment, we used the CIFAR-10 dataset with a deliberately hobbled (4-layer fully convolutional) model. On the Image Net dataset with Alex Net and Mobile Net models, we perform extensive experiments to validate the effectiveness of HBNNs compared to the state of the art and full precision accuracy. |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-10 and ImageNet but does not explicitly state the specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split citations) used for reproducibility. |
| Hardware Specification | Yes | There have been several recent binary convolutional neural network implementations on FGPAs and ASICs that provide a baseline we can use to estimate the performance of HBNNs on ZC706 FPGA platforms (Umuroglu et al., 2017) and on ASIC hardware (Alemdar et al., 2017). |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide a specific version number for it or any other key software dependencies. |
| Experiment Setup | Yes | We train all models using an SGD solver with learning rate 0.01, momentum 0.9, and weight decay 1e-4 and randomly initialized weights for 90 epochs on Py Torch. |