Self-Supervised Visual Representation Learning with Semantic Grouping

Authors: Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation. Code is available at: https://github.com/CVMI-Lab/Slot Con. We extensively assess the representation learning ability of our model by conducting transfer learning evaluation on COCO [46] object detection, instance segmentation, and semantic segmentation on Cityscapes [13], PASCAL VOC [20], and ADE20K [83].
Researcher Affiliation Collaboration Xin Wen1 Bingchen Zhao2, 3 Anlin Zheng1, 4 Xiangyu Zhang4 Xiaojuan Qi1 1University of Hong Kong 2University of Edinburgh 3Lunar AI 4MEGVII Technology
Pseudocode No The paper describes the proposed method textually and with equations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at: https://github.com/CVMI-Lab/Slot Con.
Open Datasets Yes We pre-train our models on COCO train2017 [46] and Image Net-1K [15], respectively. Object detection and instance segmentation on COCO [46], and semantic segmentation on PASCAL VOC [20], Cityscapes [13], and ADE20K [83].
Dataset Splits Yes We fine-tune all layers end-to-end on COCO train2017 split with the standard 1 schedule and report AP, AP50, AP75 on the val2017 split. For PASCAL VOC, we fine-tune the model on train_aug2012 set for 30k iterations and report the mean intersection over union (m Io U) on the val2012 set. For Cityscapes, we fine-tune on the train_fine set for 90k iterations and evaluate it on the val_fine set.
Hardware Specification Yes We adopt the LARS optimizer [77] to pre-train the model, with a batch size of 512 across eight NVIDIA 2080 Ti GPUs.
Software Dependencies No The paper mentions software like 'Detectron2' and 'MMSegmentation' but does not provide specific version numbers for these or other key software components.
Experiment Setup Yes We adopt the LARS optimizer [77] to pre-train the model, with a batch size of 512 across eight NVIDIA 2080 Ti GPUs. Following [73], we utilize the cosine learning rate decay schedule [50] with a base learning rate of 1.0, linearly scaled with the batch size (Learning Rate = 1.0 Batch Size/256), a weight decay of 10 5, and a warm-up period of 5 epochs. ... The temperature values τs and τt in the student and teacher model are set to 0.1 and 0.07, respectively. Besides, the center momentum λc is set to 0.9. The default number of prototypes K is set to 256 for COCO(+) and 2048 for Image Net... The temperature value τc for the contrastive loss is set to 0.2... and the default balancing ratio λg is set to 0.5.