reproducibilityindex.ai

MEC: Memory-efficient Convolution for Deep Neural Network

Authors: Minsik Cho, Daniel Brand

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that MEC reduces memory consumption significantly with good speedup on both mobile and server platforms, compared with other indirect convolution algorithms.
Researcher Affiliation	Industry	IBM T. J. Watson Research Center, NY, USA.
Pseudocode	Yes	Algorithm 1 O = Vanilla MEC(I, K, s) and Algorithm 2 O = MEC(I, K, s) are provided.
Open Source Code	No	The paper mentions implementing its algorithm with existing libraries and using other open-source convolutions for comparison, but it does not state that the code for MEC itself is open-source or provide a link to it.
Open Datasets	Yes	For thorough comparison, we built a comprehensive benchmark set consisting of 12 unique convolution layers, cv1-cv12 from various public DNNs (He et al., 2015; Krizhevsky et al., 2012; Sermanet et al., 2013; Simonyan & Zisserman, 2014; Szegedy et al., 2014) as in Table 2.
Dataset Splits	No	The paper mentions 'mini-batch size' and refers to training, but it does not provide explicit details about train/validation/test dataset splits, percentages, or methods for partitioning the data.
Hardware Specification	Yes	Mobile Android phone with ARM7 (MSM8960) for userside inference and training (mini-bath size=1). Server Linux server with Intel CPU (E5-2680) and Nvidia GPU (P100) for inference and training (mini-bath size=32).
Software Dependencies	No	We implemented MEC for CPU/GPU in C++ with multithreaded Open BLAS, Open MP, and cu BLAS (cu BLAS) using single 32-bit precision. We also implemented a fully parallelized im2col-based convolution on CPU/GPU (Jia, 2014) with the same libraries. We downloaded an open-source FFT-based convolution (cu FFT; Theano-FFT) for GPU. We took an open-source Winograd-based convolution (Falcon, 2016) and optimized it to reduce memory-overhead for CPU, and further modified/optimized it for GPU following (Lavin, 2015; Park et al., 2016a). (Note: Specific version numbers for these libraries/tools are not provided).
Experiment Setup	Yes	The runtime in our experiments is measured as a wall-clock time by a standard C++ library, running each algorithm 10 times and reporting the average. Mobile Android phone with ARM7 (MSM8960) for userside inference and training (mini-bath size=1). Server Linux server with Intel CPU (E5-2680) and Nvidia GPU (P100) for inference and training (mini-bath size=32). T is a platform-dependent parameter (e.g., on CPU vs. GPU, or on GPU-compute capability), and we found T around 100 to be a good threshold for latest GPUs.