reproducibilityindex.ai

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Authors: Gang Li, Heliang Zheng, Daqing Liu, Chaoyue Wang, Bing Su, Changwen Zheng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various vision tasks show that Sem MAE can learn better image representation by integrating semantic information. In particular, Sem MAE achieves 84.5% fine-tuning accuracy on Image Net-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, Sem MAE also brings significant improvements and yields the state-of-the-art performance.
Researcher Affiliation	Collaboration	Gang Li1,2 , Heliang Zheng3, Daqing Liu3, Chaoyue Wang3, Bing Su4, Changwen Zheng1 Institute of Software, Chinese Academy of Sciences1, University of Chinese Academy of Sciences2, JD Explore Academy3, Renmin University of China4 ucasligang@gmail.com, {zhengheliang,liudaqing1,wangchaoyue9}@jd.com, bingsu@ruc.edu.cn, changwen@iscas.ac.cn
Pseudocode	Yes	Algorithm 1 Algorithm of Semantic-Guided Masking in a Py Torch-like style.
Open Source Code	Yes	Our code is available at https://github.com/ucasligang/Sem MAE.
Open Datasets	Yes	The experiment is performed on Image Net-1k [11] dataset.
Dataset Splits	No	The paper mentions using ImageNet-1k for pre-training and various downstream tasks, but it does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiments. It refers to standard settings but not specific splits for validation.
Hardware Specification	Yes	Our model converges fast, which only takes 2 hours on one A100 GPU card. ... Our model is trained on 16 A-100 GPUs for 3 days
Software Dependencies	No	The paper mentions using 'Py Torch-like style' in Algorithm 1, 'Adam W' as optimizer, and specific models like 'Vi T-small', but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	We follow the most comment setting to optimize our model by Adam W [27] with a learning rate of 2.4e-3. The batch size is set to be 4096, and the weight decay is set to be 0.05. We use a cosine learning rate strategy [26] with warmup [17]. The warmup number is set to be 40 epochs, and we pre-train our model for 800 epochs. For data augmentation, we only employ random horizontal flipping in our pre-training stage. The hyper-parameter γ in Algorithm 1 is experimentally set to be 2.