EfficientFormer: Vision Transformers at MobileNet Speed

Authors: Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the superiority of Efficient Former in performance and speed on mobile devices.
Researcher Affiliation Collaboration 1Snap Inc. 2Northeastern University
Pseudocode No The paper describes methods like 'Latency Driven Slimming' and a 'gradient-based search algorithm' in prose, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Code and models are available at https://github.com/snap-research/Efficient Former.
Open Datasets Yes Our fastest model, Efficient Former-L1, achieves 79.2% top-1 accuracy on Image Net-1K [34] classification task
Dataset Splits Yes We experiment over COCO2017 [79] which contains training and validations sets of 118K and 5K images, respectively.
Hardware Specification Yes Our models are trained on a cluster with NVIDIA A100 and V100 GPUs. The inference speed on i Phone 12 (A14 bionic chip) is measured with i OS version 15 and averaged over 1,000 runs, with all available computing resources (NPU), or CPU only.
Software Dependencies Yes We implement Efficient Former through Py Torch 1.11 [73] and Timm library [74]
Experiment Setup Yes We follow the training recipe from Dei T [3] but mainly report results with 300 training epochs... We use Adam W optimizer [75, 76], warm-up training with 5 epochs, and a cosine annealing learning rate schedule. The initial learning rate is set as 10 3 (batch size 1024) and the minimum learning rate is 10 5.