SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
Authors: Gang Li, Heliang Zheng, Daqing Liu, Chaoyue Wang, Bing Su, Changwen Zheng
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various vision tasks show that Sem MAE can learn better image representation by integrating semantic information. In particular, Sem MAE achieves 84.5% fine-tuning accuracy on Image Net-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, Sem MAE also brings significant improvements and yields the state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Gang Li1,2 , Heliang Zheng3, Daqing Liu3, Chaoyue Wang3, Bing Su4, Changwen Zheng1 Institute of Software, Chinese Academy of Sciences1, University of Chinese Academy of Sciences2, JD Explore Academy3, Renmin University of China4 ucasligang@gmail.com, {zhengheliang,liudaqing1,wangchaoyue9}@jd.com, bingsu@ruc.edu.cn, changwen@iscas.ac.cn |
| Pseudocode | Yes | Algorithm 1 Algorithm of Semantic-Guided Masking in a Py Torch-like style. |
| Open Source Code | Yes | Our code is available at https://github.com/ucasligang/Sem MAE. |
| Open Datasets | Yes | The experiment is performed on Image Net-1k [11] dataset. |
| Dataset Splits | No | The paper mentions using ImageNet-1k for pre-training and various downstream tasks, but it does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiments. It refers to standard settings but not specific splits for validation. |
| Hardware Specification | Yes | Our model converges fast, which only takes 2 hours on one A100 GPU card. ... Our model is trained on 16 A-100 GPUs for 3 days |
| Software Dependencies | No | The paper mentions using 'Py Torch-like style' in Algorithm 1, 'Adam W' as optimizer, and specific models like 'Vi T-small', but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We follow the most comment setting to optimize our model by Adam W [27] with a learning rate of 2.4e-3. The batch size is set to be 4096, and the weight decay is set to be 0.05. We use a cosine learning rate strategy [26] with warmup [17]. The warmup number is set to be 40 epochs, and we pre-train our model for 800 epochs. For data augmentation, we only employ random horizontal flipping in our pre-training stage. The hyper-parameter γ in Algorithm 1 is experimentally set to be 2. |