Looking Beyond Single Images for Contrastive Semantic Segmentation Learning

Authors: FEIHU ZHANG, Philip Torr, Rene Ranftl, Stephan Richter

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that contrastive learning with our auxiliary-labeling approach consistently boosts semantic segmentation accuracy when compared to standard Image Net pre-training and outperforms existing approaches of contrastive and semi-supervised semantic segmentation.
Researcher Affiliation Collaboration Feihu Zhang Philip Torr University of Oxford René Ranftl Stephan R. Richter Intel Labs
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a concrete access link or explicit statement about the release of source code for the described methodology.
Open Datasets Yes NYUv2 [44] contains about 500 000 unlabeled and 1449 labeled images... Cityscapes [17] contains more than 100 000 unlabeled video frames... ADE20K [62] consists of 20 000 training and 2000 validation images, all of which are fully annotated. Due to the lack of unlabeled images, we supplement ADE20K with MS-COCO [36] (without using MS-COCO s annotations).
Dataset Splits Yes The labeled set is annotated with 40 classes and split into 795 train and 654 test images. Cityscapes [17] contains more than 100 000 unlabeled video frames of driving scenes, 2975 labeled images for training and another 500 annotated images for validation. ADE20K [62] consists of 20 000 training and 2000 validation images, all of which are fully annotated.
Hardware Specification Yes All experiments were conducted on 8 Nvidia Quadro 6000 GPUs.
Software Dependencies No The paper mentions using SGD with momentum [43] and augmentations from Sim CLR [14] and Cut Out [18], but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We use batches of 128 input images and set the crop size to 321 361. The memory bank for implementing momentum contrast holds the features of 384 frames. All models are trained using SGD with momentum [43]. We set weight decay to 1e 4 and the momentum term to 0.9. We perform warmup with a learning rate of 0.01 for two epochs and then train for another 18 epochs with a learning rate of 0.1. The learning rate is reduced by a factor of 10 after epochs 10, 15, and 18, respectively. We use the augmentations defined in Sim CLR [14], which include random scaling, rotation, cropping, and color transformations, together with Cut Out [18], where we fill the cut-out region with the mean value of the cut-out.