Auto-scaling Vision Transformers without Training
Authors: Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our As-Vi T achieves strong performance on classification (83.5% top-1 on Image Net-1k) and detection (52.7% m AP on COCO). and Table 5 demonstrates comparisons of our As-Vi T to other models. Compared to the previous both Transformer-based and CNNbased architectures, As-Vi T achieves stateof-the-art performance with a comparable number of parameters and FLOPs. |
| Researcher Affiliation | Collaboration | 1University of Texas, Austin 2University of Technology Sydney 3Google {wuyang.chen,atlaswang}@utexas.edu weihuang.uts@gmail.com {xianzhi,xiaodansong,dennyzhou}@google.com |
| Pseudocode | Yes | Algorithm 1: Training-free Vi T Topology Search. and Algorithm 2: Training-free Auto-Scaling Vi Ts. |
| Open Source Code | Yes | Our code is available at https://github.com/VITA-Group/As Vi T. |
| Open Datasets | Yes | Our As-Vi T achieves strong performance on classification (83.5% top-1 on Image Net-1k) and detection (52.7% m AP on COCO). and We benchmark our As-Vi T on Image Net-1k (Deng et al., 2009). Object detection is conducted on COCO 2017... |
| Dataset Splits | Yes | Object detection is conducted on COCO 2017 that contains 118,000 training and 5000 validation images. |
| Hardware Specification | Yes | the end-to-end model design and scaling process costs only 12 hours on one V100 GPU. and We set the default image size as 224 224, and use Adam W (Loshchilov & Hutter, 2017) as the optimizer with cosine learning rate decay (Loshchilov & Hutter, 2016). A batch size of 1024, an initial learning rate of 0.001, and a weight decay of 0.05 are adopted. |
| Software Dependencies | No | We use Tensorflow and Keras for training implementations and conduct all training on TPUs. The paper mentions software by name but does not provide specific version numbers. |
| Experiment Setup | Yes | We set the default image size as 224 224, and use Adam W (Loshchilov & Hutter, 2017) as the optimizer with cosine learning rate decay (Loshchilov & Hutter, 2016). A batch size of 1024, an initial learning rate of 0.001, and a weight decay of 0.05 are adopted. |