Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders
Authors: Youngwan Lee, Jeffrey Ryan Willette, Jonghee Kim, Juho Lee, Sung Ju Hwang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments follow the same architecture, settings, and pre-training recipe as MAE (He et al., 2022), and we find that the simple addition of a teacher (RC-MAE) consistently outperforms MAE in all model sizes (e.g., Vi T-S, Vi T-B, and Vi T-L) when fine-tuned for Image Net classification. |
| Researcher Affiliation | Collaboration | 1Electronics and Telecommunications Research Institute (ETRI), South Korea 2Korea Advanced Institute of Science and Technology (KAIST), South Korea |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper mentions and links to the official code for MAE (He et al., 2022), a baseline method, but does not provide an explicit statement or link for the open-source code of RC-MAE, the methodology described in this paper. |
| Open Datasets | Yes | For experiments, we pre-train on Image Net-1K and evaluate linear probing (LN) and end-to-end finetuning (FT) for classification and COCO object detection & instance segmentation for which we use the Mask R-CNN benchmark (Li et al., 2021) for dense prediction. |
| Dataset Splits | Yes | For experiments, we pre-train on Image Net-1K and evaluate linear probing (LN) and end-to-end finetuning (FT) for classification and COCO object detection & instance segmentation... and ...we use the same model weights (RC-MAE w/Vi TB for 1600epoch) fine-tuned on the original Image Net-1K as shown in Table 4 and only test without any specialized fine-tuning on the different validation sets, such as Image Net-C (Hendrycks & Dietterich, 2019),-A (Hendrycks et al., 2021b),-R (Hendrycks et al., 2021a), and -Sketch (Wang et al., 2019). |
| Hardware Specification | Yes | Although He et al. (2022) used 128 TPU-v3 cores, we have tried to reproduce the baseline MAE and train our RC-MAE on the same local GPU environment, which has 8 NVIDIA V100 GPUs (32GB) for more accessibility in the community. |
| Software Dependencies | No | The paper mentions software components like Pytorch, AdamW, and LARS optimizer, but does not provide specific version numbers for these or other key libraries/frameworks, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | The paper provides detailed settings for pre-training (Table 9), end-to-end fine-tuning (Table 10), and linear probing (Table 11), including optimizer, learning rates, batch sizes, epochs, weight decay, and various augmentation strategies. |