High Performance Depthwise and Pointwise Convolutions on Mobile Devices
Authors: Pengfei Zhang, Eric Lo, Baotong Lu6795-6802
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our implementation can respectively achieve a speedup of up to 5.5 and 2.1 against TVM (Chen et al. 2018) on DWConv and PWConv. |
| Researcher Affiliation | Academia | Pengfei Zhang, Eric Lo, Baotong Lu The Chinese University of Hong Kong {pfzhang, ericlo, btlu}@cse.cuhk.edu.hk |
| Pseudocode | Yes | Algorithm 1: Unoptimized Depthwise Convolution; Algorithm 2: Depthwise Convolution (TF-Lite); Algorithm 3: PWConv Implementation by MM; Algorithm 4: High Performance Depthwise Convolution; Algorithm 5: Matrix Multiplication in BLAS Libraries; Algorithm 6: High Performance Matrix Multiplication |
| Open Source Code | No | The paper mentions that TF-Lite is open source and references other open-source libraries (Ruy, Open BLAS, Eigen), but it does not state that the authors' own implementation code is open source or provide a link to it. |
| Open Datasets | No | The paper extracts DWConvs and PWConvs operations from Mobile Net V1, Mobile Net V2, and Mnas Net, but it does not explicitly state the specific dataset used for training or evaluating these operations within their experiments, nor does it provide access information for such a dataset. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits for their experiments. |
| Hardware Specification | Yes | We run our experiments on a 2.0GHz quad-core ARM Cortex-A57. Each core has 48KB L1 instruction cache and 32KB L1 data cache. All cores share 2MB uniļ¬ed L2 cache. |
| Software Dependencies | No | The paper mentions several software components like TVM, TF-Lite, Open BLAS, Ruy, and Eigen, and provides years of associated publications (e.g., 'TVM (Chen et al. 2018)', 'Open BLAS (Open BLAS 2015)', 'Ruy (Google 2019)'), but it does not specify explicit version numbers (e.g., v1.2.3) for these dependencies as required for reproducibility. |
| Experiment Setup | No | The paper focuses on low-level optimizations of convolution operations and discusses parameters relevant to those optimizations (e.g., Ho,b, Wo,b), but it does not provide specific hyperparameters or system-level training settings for a deep learning model (e.g., learning rate, batch size, optimizer). |