High Performance Zero-Memory Overhead Direct Convolutions
Authors: Jiyuan Zhang, Franz Franchetti, Tze Meng Low
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on conventional and embedded CPU architectures. We also show that a high performance direct convolution exhibits better scaling performance, i.e. suffers less performance drop, when increasing the number of threads. Section 5.1 Experimental Setup. Platform We run our experiments on Intel Core i7-4770K, AMD FX(tm)-8350, ARM Cortex-A57 architectures. The architecture details of those platforms are shown in Table . |
| Researcher Affiliation | Academia | Jiyuan Zhang 1 Franz Franchetti 1 Tze Meng Low 1 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. |
| Pseudocode | Yes | Algorithm 1 Naive Convolution Algorithm and Algorithm 2 Reorder Convolution Algorithm and Algorithm 3 Parallelized Direct Convolution Algorithm |
| Open Source Code | No | The paper does not state that the code for their direct convolution implementation is publicly available or provide a link to it. |
| Open Datasets | Yes | All implementations were ran against all convolution layers found in Alex Net (Krizhevsky et al., 2012), Goog Le Net (Szegedy et al., 2015) and VGG (Simonyan & Zisserman, 2014). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts) for the convolution layers benchmarked from AlexNet, GoogLeNet, and VGG. |
| Hardware Specification | Yes | Platform We run our experiments on Intel Core i7-4770K, AMD FX(tm)-8350, ARM Cortex-A57 architectures. The architecture details of those platforms are shown in Table 1. Table 1. Details of specific architectures used Intel AMD ARM i7-4770K FX(tm)-8350 Cortex-A57 Architecture Haswell Piledriver ARMv8 Frequency 3.5GHz 4GHz 1.1GHz Cores 4 4 2 Nvec 8 8 4 |
| Software Dependencies | No | The paper mentions software like Intel Math Kernel Library (MKL), Open BLAS, and NNPACK but does not provide specific version numbers for these components, which is required for reproducibility. |
| Experiment Setup | No | The paper details algorithmic and architectural mapping strategies for direct convolution but does not provide typical experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) for training a deep neural network. |