High Performance Depthwise and Pointwise Convolutions on Mobile Devices

Authors: Pengfei Zhang, Eric Lo, Baotong Lu6795-6802

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our implementation can respectively achieve a speedup of up to 5.5 and 2.1 against TVM (Chen et al. 2018) on DWConv and PWConv.
Researcher Affiliation Academia Pengfei Zhang, Eric Lo, Baotong Lu The Chinese University of Hong Kong {pfzhang, ericlo, btlu}@cse.cuhk.edu.hk
Pseudocode Yes Algorithm 1: Unoptimized Depthwise Convolution; Algorithm 2: Depthwise Convolution (TF-Lite); Algorithm 3: PWConv Implementation by MM; Algorithm 4: High Performance Depthwise Convolution; Algorithm 5: Matrix Multiplication in BLAS Libraries; Algorithm 6: High Performance Matrix Multiplication
Open Source Code No The paper mentions that TF-Lite is open source and references other open-source libraries (Ruy, Open BLAS, Eigen), but it does not state that the authors' own implementation code is open source or provide a link to it.
Open Datasets No The paper extracts DWConvs and PWConvs operations from Mobile Net V1, Mobile Net V2, and Mnas Net, but it does not explicitly state the specific dataset used for training or evaluating these operations within their experiments, nor does it provide access information for such a dataset.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits for their experiments.
Hardware Specification Yes We run our experiments on a 2.0GHz quad-core ARM Cortex-A57. Each core has 48KB L1 instruction cache and 32KB L1 data cache. All cores share 2MB unified L2 cache.
Software Dependencies No The paper mentions several software components like TVM, TF-Lite, Open BLAS, Ruy, and Eigen, and provides years of associated publications (e.g., 'TVM (Chen et al. 2018)', 'Open BLAS (Open BLAS 2015)', 'Ruy (Google 2019)'), but it does not specify explicit version numbers (e.g., v1.2.3) for these dependencies as required for reproducibility.
Experiment Setup No The paper focuses on low-level optimizations of convolution operations and discusses parameters relevant to those optimizations (e.g., Ho,b, Wo,b), but it does not provide specific hyperparameters or system-level training settings for a deep learning model (e.g., learning rate, batch size, optimizer).