reproducibilityindex.ai

MST: Masked Self-Supervised Transformer for Visual Representation

Authors: Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments on multiple datasets demonstrate the effectiveness and generality of the proposed method.
Researcher Affiliation	Collaboration	National Laboratory of Pattern Recognition, Institute of Automation, CAS School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences Sense Time Research University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 Pseudo code of attention-guided mask strategy in a Py Torch-like style.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Dataset and Models Our method is validated on the popular Image Net 1k dataset [9]. ... We perform object detection experiments with MS COCO [18] dataset and Mask R-CNN detector [15] framework. ... SETR [37] provide a semantic segmentation framework for standard Vision Transformer. Hence, we adopt the SETR as the semantic segmentation strategy on Cityscapes [8].
Dataset Splits	Yes	This dataset contains 1.28M images in the training set and 50K images in the validation set from 1000 classes. We only use the training set during the process of self-supervised learning. ... MS COCO is a popular benchmark for object detection, with 118K images in training set and 5K images for validation. ... Cityscapes contains 5000 images, with 19 object categories annotated in pixel level. There are 2975, 500, and 1525 images in training, validation, and testing set respectively.
Hardware Specification	Yes	Throughput (im/s) is calculated on a single NVIDIA V100 GPU with batch size 128.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Our model is optimized by Adam W [22] with learning rate 2 10 3 and batch size 1024. Weight decay is set to be 0.04. We adopt learning rate warmup [12] in the ﬁrst 10 epochs, and after warmup the learning rate follows a cosine decay schedule [21]. The model uses multi-crop similar to [1] and data augmentations similar to [13]. The setting of momentum, temperature coefﬁcient, and weight decay follows [2]. The coefﬁcient λ1 of basic instance discrimination task is set as 1.0 while the restoration task λ2 is set as 0.6.