YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

Authors: Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang955-963

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results indicate that our pruning scheme achieves 14 compression rate of YOLOv4 with 49.0 m AP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5 speedup.
Researcher Affiliation Academia Yuxuan Cai1 , Hongjia Li1 , Geng Yuan1 , Wei Niu2, Yanyu Li1, Xulong Tang3, Bin Ren2, Yanzhi Wang1 1Northeastern University 2William & Mary 3University of Pittsburgh
Pseudocode No The paper includes a section on the 'Reweighted Regularization Pruning Algorithm' which describes the algorithm mathematically, but it does not present it in a structured pseudocode or algorithm block format.
Open Source Code Yes Source code is at: https://github.com/nightsnack/YOLObile.
Open Datasets Yes Our YOLObile is derived based on YOLOv4, with 320 320 input size, and train on MS COCO dataset (Lin et al. 2014).
Dataset Splits No The paper mentions training on the MS COCO dataset but does not specify the validation split used, either by percentage, sample count, or a reference to predefined validation sets.
Hardware Specification Yes Our models are trained on a server with eight NVIDIA RTX 2080Ti GPUs. ... We evaluate our framework on an off-the-shelf Samsung Galaxy S20 smartphone, which has a Qualcomm Snapdragon 865 Octa-core CPU and a Qualcomm Adreno 650 GPU.
Software Dependencies No The paper states 'The training methods are implemented using Py Torch API.' but does not provide specific version numbers for PyTorch or any other software dependencies crucial for reproduction.
Experiment Setup Yes Our YOLObile is derived based on YOLOv4, with 320 320 input size, and train on MS COCO dataset (Lin et al. 2014). ... We adopt 8 4 as our block size, i.e. 4 consecutive channels of 8 consecutive filters.