What Do Self-Supervised Vision Transformers Learn?
Authors: Namuk Park, Wonjae Kim, Byeongho Heo, Taekyung Kim, Sangdoo Yun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a comparative study on how and why contrastive learning (CL) and masked image modeling (MIM) differ in their representations and in their performance of downstream tasks. |
| Researcher Affiliation | Industry | Namuk Park1 Wonjae Kim2 Byeongho Heo2 Taekyung Kim2 Sangdoo Yun2 1Prescient Design, Genentech 2NAVER AI Lab park.namuk@gene.com {wonjae.kim,bh.heo,taekyung.k,sangdoo.yun}@navercorp.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code for analysis is available at https://github.com/naver-ai/cl-vs-mim. |
| Open Datasets | Yes | Our analyses mainly compare Vi T-B/16 pre-trained on Image Net-1K (Russakovsky et al., 2015) |
| Dataset Splits | Yes | We use the Image Net validation images for our experiments. Table A.1: Training settings. batch size 1k training epoch 50 |
| Hardware Specification | Yes | All experiments use {1, 4, 8} NVIDIA A100 Tensor Core GPU. |
| Software Dependencies | No | Neural network models are implemented in Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | Table A.1: Training settings. optimizer sgd adamw adamw base learning rate 1.0e-0 1.25e-3 1.0e-4 weight decay 0.05 0.05 0.05 batch size 1k 2k 1k training epoch 50 100 100 learning rate schedule cosine cosine multistep warmup epoch 0 20 10 warmup schedule linear linear randaugment 9, 0.5 9, 0.5 label smoothing 0.1 0.1 mixup 0.8 0.8 cutmix 1.0 1.0 stochastic depth 0.1 0.1 layer decay 0.65 1.0 gradient clip 5.0 5.0 |