WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic
Authors: Renkun Ni, Hong-min Chu, Oscar Castaneda, Ping-yeh Chiang, Christoph Studer, Tom Goldstein
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach using both software and hardware platforms. |
| Researcher Affiliation | Academia | Renkun Ni University of Maryland rn9zm@cs.umd.edu Hong-min Chu University of Maryland hmchu@cs.umd.edu Oscar Casta neda ETH Zurich caoscar@ethz.ch Ping-yeh Chiang University of Maryland pchiang@cs.umd.edu Christoph Studer ETH Zurich studer@ethz.ch Tom Goldstein University of Maryland tomg@cs.umd.edu |
| Pseudocode | Yes | Algorithm 1: Carry Amount Calculation initialization v, b; u = P i ((sign(vi) + 1)/2) vi + (( sign(vi) + 1)/2) vi + 2b ; ci = u, ri = 0, c = 0; while ci = 0 do ci+1 = (ci + ri)/2b ; ri+1 = (ci + ri) mod 2b; c = c + ci+1; ci = ci+1, ri = ri+1 end return c |
| Open Source Code | No | The paper mentions extending the Gemmlowp library (Jacob et al., 2016) but does not explicitly state that the authors' own implementation code or extensions for Wrap Net are open-sourced or publicly available. |
| Open Datasets | Yes | We compare the accuracy and efficiency of Wrap Net to networks with full-precision accumulators using the CIFAR-10 and Image Net datasets. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet datasets but does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper, nor does it explicitly reference the standard splits of these datasets as part of its methodology. |
| Hardware Specification | Yes | We conduct an efficiency analysis of parallelization by bit-packing, both with and without vector operations, on an Intel i7-7700HQ CPU operating at 2.80 GHz. [...] To illustrate the potential benefits of Wrap Net for custom hardware accelerators, we have implemented a multiply-accumulate (MAC) unit in a commercial 28nm CMOS technology. |
| Software Dependencies | No | The paper mentions software tools like 'Gemmlowp' and 'AVX2' and hardware design tools like 'Synopsys Design Compiler (DC)' and 'Cadence Innovus', but it does not specify version numbers for any software libraries or frameworks used for training or inference, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We set the transition slope k = 2, and the initial overflow rate p = 5%. The overflow penalty coefficients for CIFAR-10 and Image Net are 0.01 and 0.001, respectively. For the CIFAR-10 results, we use ADAM as our optimizer with an initial learning rate of 0.001. For both warm-up and fine-tuning stages, we run 200 epochs, and the learning rate is divided by 10 every 60 epochs. For all the Image Net results, we use SGD with momentum 0.9, weight decay 1 10 4 as our optimizer. We run 60 epochs for both warm-up and fine-tuning stages, where the initial learning rate is 0.01, which is divided by 10 at (20, 40, 50) epochs. |