Bit-Pragmatic Deep Neural Network Computing

Authors: Jorge Albericio, Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Measurements demonstrate that for the convolutional layers on Convolutional Neural Networks and during inference, PRA improves performance by 4.3x over the Da Dia Nao (Da DN) accelerator Chen et al. (2014) and by 4.5x when Da DN uses an 8-bit quantized representation Warden (2016). Experimental measurements with recent CNNs for image classification demonstrate that most straightforward PRA variant, boosts average performance for the convolutional layers to 2.59x over the state-of-the-art Da DN accelerator.
Researcher Affiliation Academia Jorge Albericio , Patric Judd, Alberto Delmas Lascorz, Sayeh Sharify & Andreas Moshovos Electrical and Computer Engineering University of Toronto Toronto, ON, M5S 3G4, Canada {jorge, juddpatr, delmasl1,sayeh,moshovos}@ece.utoronto.ca
Pseudocode No The paper describes the Pragmatic engine and its units (e.g., Figure 2b, Figure 4) but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code for the described methodology. It cites external resources such as 'https://github.com/google/gemmlowp' (Google, 2016) but these are not the authors' own implementation.
Open Datasets Yes Experimental measurements with recent CNNs for image classification demonstrate that most straightforward PRA variant, boosts average performance for the convolutional layers to 2.59x over the state-of-the-art Da DN accelerator. Table 2: Per convolutional layer activation precision profiles. Alex Net ... VGG 19.
Dataset Splits No The paper mentions using 'recent CNNs for image classification' and provides 'Per convolutional layer activation precision profiles' in Table 2, but does not explicitly state the dataset splits (e.g., training, validation, test percentages or counts) or refer to standard predefined splits for these networks.
Hardware Specification No The paper mentions that 'designs were synthesized with the Synopsis Design Compiler Synopsys for a TSMC 65nm library' and memory blocks were 'modeled using CACTI' and 'Destiny', but it does not specify the particular hardware (e.g., CPU, GPU models, or compute cluster specifications) on which the simulations or experiments were executed.
Software Dependencies No The paper mentions software tools like 'Synopsis Design Compiler Synopsys', 'CACTI', and 'Destiny' for modeling and synthesis, and references 'TensorFlow' for quantization. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes This section evaluates the single-stage shifting PRA configuration of Sections 5 5.1, and the 2-stage shifting variants of Section 5.1. Section 6.1 reports performance while Section 6.1 reports area and power. In this section, All PRA systems use pallet synchronization. Configuration PRAx R 2b refers to a configuration using x SSRs. This work investigates a software guided approach where the precision requirements of each layer are used to zero out a number of prefix and suffix bits at the output of each layer.