Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders

Authors: Youngwan Lee, Jeffrey Ryan Willette, Jonghee Kim, Juho Lee, Sung Ju Hwang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments follow the same architecture, settings, and pre-training recipe as MAE (He et al., 2022), and we find that the simple addition of a teacher (RC-MAE) consistently outperforms MAE in all model sizes (e.g., Vi T-S, Vi T-B, and Vi T-L) when fine-tuned for Image Net classification.
Researcher Affiliation Collaboration 1Electronics and Telecommunications Research Institute (ETRI), South Korea 2Korea Advanced Institute of Science and Technology (KAIST), South Korea
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper mentions and links to the official code for MAE (He et al., 2022), a baseline method, but does not provide an explicit statement or link for the open-source code of RC-MAE, the methodology described in this paper.
Open Datasets Yes For experiments, we pre-train on Image Net-1K and evaluate linear probing (LN) and end-to-end finetuning (FT) for classification and COCO object detection & instance segmentation for which we use the Mask R-CNN benchmark (Li et al., 2021) for dense prediction.
Dataset Splits Yes For experiments, we pre-train on Image Net-1K and evaluate linear probing (LN) and end-to-end finetuning (FT) for classification and COCO object detection & instance segmentation... and ...we use the same model weights (RC-MAE w/Vi TB for 1600epoch) fine-tuned on the original Image Net-1K as shown in Table 4 and only test without any specialized fine-tuning on the different validation sets, such as Image Net-C (Hendrycks & Dietterich, 2019),-A (Hendrycks et al., 2021b),-R (Hendrycks et al., 2021a), and -Sketch (Wang et al., 2019).
Hardware Specification Yes Although He et al. (2022) used 128 TPU-v3 cores, we have tried to reproduce the baseline MAE and train our RC-MAE on the same local GPU environment, which has 8 NVIDIA V100 GPUs (32GB) for more accessibility in the community.
Software Dependencies No The paper mentions software components like Pytorch, AdamW, and LARS optimizer, but does not provide specific version numbers for these or other key libraries/frameworks, which are necessary for reproducible software dependencies.
Experiment Setup Yes The paper provides detailed settings for pre-training (Table 9), end-to-end fine-tuning (Table 10), and linear probing (Table 11), including optimizer, learning rates, batch sizes, epochs, weight decay, and various augmentation strategies.