Deep Learning with Limited Numerical Precision

Authors: Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the validity of the proposed approach by training deep neural networks for the MNIST (Lecun & Cortes) and CIFAR10 (Krizhevsky et al., 2012) image classification tasks. Deep networks trained using 16-bit wide fixed-point and stochastic rounding achieve nearly the same performance as that obtained when trained using 32-bit floatingpoint computations.
Researcher Affiliation Industry Suyog Gupta SUYOG@US.IBM.COM Ankur Agrawal ANKURAGR@US.IBM.COM Kailash Gopalakrishnan KAILASH@US.IBM.COM IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Pritish Narayanan PNARAYA@US.IBM.COM IBM Almaden Research Center, San Jose, CA 95120
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes processes but not in a formal pseudocode format.
Open Source Code No The paper does not provide concrete access to source code. There is no specific repository link, explicit code release statement, or mention of code in supplementary materials for the methodology described.
Open Datasets Yes We test the validity of the proposed approach by training deep neural networks for the MNIST (Lecun & Cortes) and CIFAR10 (Krizhevsky et al., 2012) image classification tasks.
Dataset Splits No The paper specifies training and test sets for MNIST and CIFAR10 datasets (e.g., '60,000 training images and 10,000 test images' for MNIST), but it does not explicitly mention or specify any validation dataset splits or cross-validation methodology.
Hardware Specification Yes Our prototype is implemented on an off-the-shelf FPGA card featuring a Xilinx Kintex325T FPGA and 8 GB DDR3 memory, and communicating with the host PC over a PCIe bus. This FPGA has 840 DSP multiply-accumulate units and almost 2 MB of on-chip block RAM.
Software Dependencies No The paper mentions 'vendor-supplied BLAS libraries' and 'Xilinx’s Vivado synthesis and place-and-route tool' but does not provide specific version numbers for these software components.
Experiment Setup Yes The weights in each layer are initialized by sampling random values from N (0, 0.01) while the bias vectors are initialized to 0. The network is trained using minibatch stochastic gradient descent (SGD) with a minibatch size of 100 to minimize the cross entropy objective function. For CNNs, an exponentially decreasing learning rate scaling it by a factor of 0.95 after every epoch of training. The learning rate for the first epoch is set to 0.1. Momentum (p = 0.9) is used to speed up SGD convergence. The weight decay parameter is set to 0.0005 for all layers.