Semi-supervised Vision Transformers at Scale

Authors: Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed method, dubbed Semi-Vi T, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-Vi T also enjoys the scalability benefits of Vi Ts that can be readily scaled up to large-size models with increasing accuracy. For example, Semi-Vi T-Huge achieves an impressive 80% top-1 accuracy on Image Net using only 1% labels, which is comparable with Inception-v4 using 100% Image Net labels. The code is available at https://github.com/amazon-science/semi-vit. 4 Experiments We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation.
Researcher Affiliation Industry AWS AI Labs {zhaoweic,ravinash,pffavaro,manchenw,dmodolo,ztu,soattos}@amazon.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., sections labeled "Pseudocode" or "Algorithm").
Open Source Code Yes The code is available at https://github.com/amazon-science/semi-vit.
Open Datasets Yes We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation. ImageNet is cited as [54]: "Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis., 115(3):211 252, 2015."
Dataset Splits Yes We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation.
Hardware Specification No The provided paper text does not include specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. While the ethical review section mentions that this information is available in the supplementary material, it is not present in the main paper text provided.
Software Dependencies No The paper mentions software components and techniques (e.g., Adam W, mixup, Cutmix, and refers to "Pytorch image models" for implementation), but it does not specify explicit version numbers for these software dependencies (e.g., Python version, PyTorch version, or specific library versions).
Experiment Setup Yes All learning is optimized with Adam W [46], using cosine learning rate schedule, with a weight decay of 0.05. The default momentum decay m of (1) is 0.9999. In a minibatch, Nu = 5Nl, and the loss trade-off µ = 5. The mixup is a combination of mixup [75] and Cutmix [74] as in the implementation of [69]. More details can be found in the appendix. (Also, tables like Table 4 show ablation on confidence threshold values, and Table 5 on momentum decay, providing specific hyperparameter settings).