reproducibilityindex.ai

What Do Self-Supervised Vision Transformers Learn?

Authors: Namuk Park, Wonjae Kim, Byeongho Heo, Taekyung Kim, Sangdoo Yun

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a comparative study on how and why contrastive learning (CL) and masked image modeling (MIM) differ in their representations and in their performance of downstream tasks.
Researcher Affiliation	Industry	Namuk Park1 Wonjae Kim2 Byeongho Heo2 Taekyung Kim2 Sangdoo Yun2 1Prescient Design, Genentech 2NAVER AI Lab park.namuk@gene.com {wonjae.kim,bh.heo,taekyung.k,sangdoo.yun}@navercorp.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code for analysis is available at https://github.com/naver-ai/cl-vs-mim.
Open Datasets	Yes	Our analyses mainly compare Vi T-B/16 pre-trained on Image Net-1K (Russakovsky et al., 2015)
Dataset Splits	Yes	We use the Image Net validation images for our experiments. Table A.1: Training settings. batch size 1k training epoch 50
Hardware Specification	Yes	All experiments use {1, 4, 8} NVIDIA A100 Tensor Core GPU.
Software Dependencies	No	Neural network models are implemented in Py Torch (Paszke et al., 2019).
Experiment Setup	Yes	Table A.1: Training settings. optimizer sgd adamw adamw base learning rate 1.0e-0 1.25e-3 1.0e-4 weight decay 0.05 0.05 0.05 batch size 1k 2k 1k training epoch 50 100 100 learning rate schedule cosine cosine multistep warmup epoch 0 20 10 warmup schedule linear linear randaugment 9, 0.5 9, 0.5 label smoothing 0.1 0.1 mixup 0.8 0.8 cutmix 1.0 1.0 stochastic depth 0.1 0.1 layer decay 0.65 1.0 gradient clip 5.0 5.0