Fast Vision Transformers with HiLo Attention
Authors: Zizheng Pan, Jianfei Cai, Bohan Zhuang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct image classification experiments on Image Net-1K [43], a large-scale image dataset which contains 1.2M training images and 50K validation images from 1K categories. We measure the model performance by Top-1 accuracy. Furthermore, we report the FLOPs, throughput, as well as training/test memory consumption on GPUs. |
| Researcher Affiliation | Academia | Zizheng Pan Jianfei Cai Bohan Zhuang Department of Data Science & AI, Monash University, Australia |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ziplab/LITv2. |
| Open Datasets | Yes | We conduct image classification experiments on Image Net-1K [43], a large-scale image dataset which contains 1.2M training images and 50K validation images from 1K categories. |
| Dataset Splits | Yes | We conduct image classification experiments on Image Net-1K [43], a large-scale image dataset which contains 1.2M training images and 50K validation images from 1K categories. |
| Hardware Specification | Yes | Throughput is tested on one NVIDIA RTX 3090 GPU and averaged over 30 runs. (Table 1 footnote). Evaluations are based on a batch size of 64 on one RTX 3090 GPU. (Figure 3 caption). Intel Core i9-10900X CPU @ 3.70GHz and NVIDIA Ge Force RTX 3090 (Figure 6). |
| Software Dependencies | No | The paper mentions the 'mmdetection [4] framework' but does not provide specific version numbers for any software dependencies, libraries, or programming languages used for reproducibility. |
| Experiment Setup | Yes | All models are trained for 300 epochs from scratch on 8 V100 GPUs. At training time, we set the total batch size as 1,024. The input images are resized and randomly cropped into 224 224. The initial learning rate is set to 1 10 3 and the weight decay is set to 5 10 2. We use Adam W optimizer with a cosine decay learning rate scheduler. |