reproducibilityindex.ai

Auto-scaling Vision Transformers without Training

Authors: Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our As-Vi T achieves strong performance on classiﬁcation (83.5% top-1 on Image Net-1k) and detection (52.7% m AP on COCO). and Table 5 demonstrates comparisons of our As-Vi T to other models. Compared to the previous both Transformer-based and CNNbased architectures, As-Vi T achieves stateof-the-art performance with a comparable number of parameters and FLOPs.
Researcher Affiliation	Collaboration	1University of Texas, Austin 2University of Technology Sydney 3Google {wuyang.chen,atlaswang}@utexas.edu weihuang.uts@gmail.com {xianzhi,xiaodansong,dennyzhou}@google.com
Pseudocode	Yes	Algorithm 1: Training-free Vi T Topology Search. and Algorithm 2: Training-free Auto-Scaling Vi Ts.
Open Source Code	Yes	Our code is available at https://github.com/VITA-Group/As Vi T.
Open Datasets	Yes	Our As-Vi T achieves strong performance on classiﬁcation (83.5% top-1 on Image Net-1k) and detection (52.7% m AP on COCO). and We benchmark our As-Vi T on Image Net-1k (Deng et al., 2009). Object detection is conducted on COCO 2017...
Dataset Splits	Yes	Object detection is conducted on COCO 2017 that contains 118,000 training and 5000 validation images.
Hardware Specification	Yes	the end-to-end model design and scaling process costs only 12 hours on one V100 GPU. and We set the default image size as 224 224, and use Adam W (Loshchilov & Hutter, 2017) as the optimizer with cosine learning rate decay (Loshchilov & Hutter, 2016). A batch size of 1024, an initial learning rate of 0.001, and a weight decay of 0.05 are adopted.
Software Dependencies	No	We use Tensorflow and Keras for training implementations and conduct all training on TPUs. The paper mentions software by name but does not provide specific version numbers.
Experiment Setup	Yes	We set the default image size as 224 224, and use Adam W (Loshchilov & Hutter, 2017) as the optimizer with cosine learning rate decay (Loshchilov & Hutter, 2016). A batch size of 1024, an initial learning rate of 0.001, and a weight decay of 0.05 are adopted.