reproducibilityindex.ai

Semi-supervised Vision Transformers at Scale

Authors: Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed method, dubbed Semi-Vi T, achieves comparable or better performance than the CNN counterparts in the semi-supervised classiﬁcation setting. Semi-Vi T also enjoys the scalability beneﬁts of Vi Ts that can be readily scaled up to large-size models with increasing accuracy. For example, Semi-Vi T-Huge achieves an impressive 80% top-1 accuracy on Image Net using only 1% labels, which is comparable with Inception-v4 using 100% Image Net labels. The code is available at https://github.com/amazon-science/semi-vit. 4 Experiments We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation.
Researcher Affiliation	Industry	AWS AI Labs {zhaoweic,ravinash,pffavaro,manchenw,dmodolo,ztu,soattos}@amazon.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (e.g., sections labeled "Pseudocode" or "Algorithm").
Open Source Code	Yes	The code is available at https://github.com/amazon-science/semi-vit.
Open Datasets	Yes	We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation. ImageNet is cited as [54]: "Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis., 115(3):211 252, 2015."
Dataset Splits	Yes	We evaluate Semi-Vi T mainly on Image Net, which consists of 1.28M training and 50K validation images. We sample 10%/1% labels from the Image Net training set for the semi-supervised evaluation.
Hardware Specification	No	The provided paper text does not include specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. While the ethical review section mentions that this information is available in the supplementary material, it is not present in the main paper text provided.
Software Dependencies	No	The paper mentions software components and techniques (e.g., Adam W, mixup, Cutmix, and refers to "Pytorch image models" for implementation), but it does not specify explicit version numbers for these software dependencies (e.g., Python version, PyTorch version, or specific library versions).
Experiment Setup	Yes	All learning is optimized with Adam W [46], using cosine learning rate schedule, with a weight decay of 0.05. The default momentum decay m of (1) is 0.9999. In a minibatch, Nu = 5Nl, and the loss trade-off µ = 5. The mixup is a combination of mixup [75] and Cutmix [74] as in the implementation of [69]. More details can be found in the appendix. (Also, tables like Table 4 show ablation on confidence threshold values, and Table 5 on momentum decay, providing specific hyperparameter settings).