DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Authors: Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tong Wang, ZHAO-XIANG ZHANG
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations of Drop Pos show strong capabilities. Drop Pos outperforms supervised pre-training and achieves competitive results compared with state-of-the-art selfsupervised alternatives on a wide range of downstream benchmarks. The code is publicly available at https://github.com/Haochen-Wang409/Drop Pos. |
| Researcher Affiliation | Collaboration | Haochen Wang1,3 Junsong Fan1,4 Yuxi Wang1,4 Kaiyou Song2 Tong Wang2 Zhaoxiang Zhang1,3,4 1Center for Research on Intelligent Perception and Computing (CRIPAC), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) 2Megvii Technology 3University of Chinese Academy of Sciences (UCAS) 4Centre for Artificial Intelligence and Robotics, HKISI_CAS |
| Pseudocode | Yes | Algorithm 1 Pseudo-Code of Drop Pos. |
| Open Source Code | Yes | The code is publicly available at https://github.com/Haochen-Wang409/Drop Pos. |
| Open Datasets | Yes | We perform self-supervised pre-training on the Image Net-1K [48] training set with a resolution of 224x224. |
| Dataset Splits | Yes | We perform self-supervised pre-training on the Image Net-1K [48] training set with a resolution of 224x224. We report top-1 validation accuracy of a single 224x224 crop. |
| Hardware Specification | Yes | For Vi T-B/16, pre-training and fine-tuning are conducted with 64 and 32 2080Ti GPUs, respectively. For Vi T-L/16, pre-training and fine-tuning are conducted with 32 and 16 Tesla V100 GPUs, respectively. Experiments are conducted on 8 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software like Detectron2 [61], ViTDet [36], and MMSegmentation [14], but it does not specify the version numbers for these or any other key software components used in the experiments. |
| Experiment Setup | Yes | config pre-training fine-tuning optimizer Adam W Adam W base learning rate 1.5e-4 1e-3 weight decay 0.05 0.05 momentum β1, β2 = 0.9, 0.95 β1, β2 = 0.9, 0.999 layer-wise lr decay 1.0 0.8 batch size 4096 1024 learning rate schedule cosine decay cosine decay warmup epochs 10 (Vi T-B/16), 40 (Vi T-L/16) 5 training epochs 200 100 (Vi T-B/16), 50 (Vi T-L/16) augmentation Random Resized Crop Rand Aug (9, 0.5) [16] label smoothing 0.1 mixup [68] 0.8 cutmix [65] 1.0 drop path [31] 0.1 |