Positional Label for Self-Supervised Vision Transformer
Authors: Zhemin Zhang, Xun Gong
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that with the proposed self-supervised methods, Vi T-B and Swin-B gain improvements of 1.20% (top-1 Acc) and 0.74% (top-1 Acc) on Image Net, respectively, and 6.15% and 1.14% improvement on Mini-Image Net. |
| Researcher Affiliation | Academia | Zhemin Zhang1, Xun Gong1,2,3* 1School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan, China 2Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China 3Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Chengdu, Sichuan, China zheminzhang@my.swjtu.edu.cn, xgong@swjtu.edu.cn |
| Pseudocode | No | The paper describes methods using mathematical equations and figures, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is publicly available at: https://github.com/zhangzhemin/Positional Label. |
| Open Datasets | Yes | Dataset. For image classification, we benchmark the proposed positional label on the Image Net-1K, which contains 1.28M training images and 50K validation images from 1,000 classes. To explore the performance of positional label on small datasets, we also conducted experiments on Caltech-256 (Griffin, Holub, and Perona 2007) and Mini-Image Net (Krizhevsky, Sutskever, and Hinton 2012). |
| Dataset Splits | Yes | For image classification, we benchmark the proposed positional label on the Image Net-1K, which contains 1.28M training images and 50K validation images from 1,000 classes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as CPU/GPU models or memory. |
| Software Dependencies | No | We use the Py Torch toolbox (Paszke et al. 2019) to implement all our experiments. While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with versions. |
| Experiment Setup | Yes | We employ an Adam W (Kingma and Ba 2014) optimizer for 300 epochs using a cosine decay learning rate scheduler and 20 epochs of linear warm-up. A batch size of 256, an initial learning rate of 0.001, and a weight decay of 0.05 are used. Vi T-B/16 uses an image size 384 and others use 224. |