EfficientFormer: Vision Transformers at MobileNet Speed
Authors: Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show the superiority of Efficient Former in performance and speed on mobile devices. |
| Researcher Affiliation | Collaboration | 1Snap Inc. 2Northeastern University |
| Pseudocode | No | The paper describes methods like 'Latency Driven Slimming' and a 'gradient-based search algorithm' in prose, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code and models are available at https://github.com/snap-research/Efficient Former. |
| Open Datasets | Yes | Our fastest model, Efficient Former-L1, achieves 79.2% top-1 accuracy on Image Net-1K [34] classification task |
| Dataset Splits | Yes | We experiment over COCO2017 [79] which contains training and validations sets of 118K and 5K images, respectively. |
| Hardware Specification | Yes | Our models are trained on a cluster with NVIDIA A100 and V100 GPUs. The inference speed on i Phone 12 (A14 bionic chip) is measured with i OS version 15 and averaged over 1,000 runs, with all available computing resources (NPU), or CPU only. |
| Software Dependencies | Yes | We implement Efficient Former through Py Torch 1.11 [73] and Timm library [74] |
| Experiment Setup | Yes | We follow the training recipe from Dei T [3] but mainly report results with 300 training epochs... We use Adam W optimizer [75, 76], warm-up training with 5 epochs, and a cosine annealing learning rate schedule. The initial learning rate is set as 10 3 (batch size 1024) and the minimum learning rate is 10 5. |