reproducibilityindex.ai

High Performance Zero-Memory Overhead Direct Convolutions

Authors: Jiyuan Zhang, Franz Franchetti, Tze Meng Low

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on conventional and embedded CPU architectures. We also show that a high performance direct convolution exhibits better scaling performance, i.e. suffers less performance drop, when increasing the number of threads. Section 5.1 Experimental Setup. Platform We run our experiments on Intel Core i7-4770K, AMD FX(tm)-8350, ARM Cortex-A57 architectures. The architecture details of those platforms are shown in Table .
Researcher Affiliation	Academia	Jiyuan Zhang 1 Franz Franchetti 1 Tze Meng Low 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA.
Pseudocode	Yes	Algorithm 1 Naive Convolution Algorithm and Algorithm 2 Reorder Convolution Algorithm and Algorithm 3 Parallelized Direct Convolution Algorithm
Open Source Code	No	The paper does not state that the code for their direct convolution implementation is publicly available or provide a link to it.
Open Datasets	Yes	All implementations were ran against all convolution layers found in Alex Net (Krizhevsky et al., 2012), Goog Le Net (Szegedy et al., 2015) and VGG (Simonyan & Zisserman, 2014).
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts) for the convolution layers benchmarked from AlexNet, GoogLeNet, and VGG.
Hardware Specification	Yes	Platform We run our experiments on Intel Core i7-4770K, AMD FX(tm)-8350, ARM Cortex-A57 architectures. The architecture details of those platforms are shown in Table 1. Table 1. Details of specific architectures used Intel AMD ARM i7-4770K FX(tm)-8350 Cortex-A57 Architecture Haswell Piledriver ARMv8 Frequency 3.5GHz 4GHz 1.1GHz Cores 4 4 2 Nvec 8 8 4
Software Dependencies	No	The paper mentions software like Intel Math Kernel Library (MKL), Open BLAS, and NNPACK but does not provide specific version numbers for these components, which is required for reproducibility.
Experiment Setup	No	The paper details algorithmic and architectural mapping strategies for direct convolution but does not provide typical experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) for training a deep neural network.