ReCo: Retrieve and Co-segment for Zero-shot Transfer

Authors: Gyungin Shin, Weidi Xie, Samuel Albanie

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our approach, termed Retrieve and Co-segment (Re Co) performs favourably to conventional unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate Re Co s ability to generate specialist segmenters for extremely rare objects. In this section, we start by describing the datasets used for our experiments (Sec. 4.1) and implementation details (Sec. 4.2). Then, we conduct an ablation study (Sec. 4.3) and compare our model to state-of-the-art methods for unsupervised semantic segmentation with and without language-image pre-training (Sec. 4.4). Finally, we showcase our model s ability to segment rare-category objects (Sec. 4.5).
Researcher Affiliation Academia Gyungin Shin1 1Visual Geometry Group University of Oxford UK Weidi Xie1,2 2Coop. Medianet Innovation Center Shanghai Jiao Tong University China Samuel Albanie3 3Department of Engineering University of Cambridge UK
Pseudocode Yes Pseudocode for Re Co can be found in the supplementary material.
Open Source Code Yes Our implementation is based on the Py Torch library [66] and made publicly available.2 2Code available at https://github.com/Noel Shin/reco
Open Datasets Yes For our ablation study, we use the Image Net1K [16] validation set to curate archive for concepts of interest... We evaluate on standard benchmarks including the Cityscapes [14] validation split... KITTI-STEP [87] validation set... and COCO-Stuff [6] validation split... We emphasise that no ground-truth labels are used for training. Finally, to demonstrate our model s ability to segment rare concepts, we use the LAION-5B dataset [75] with 5 billion images... To assess performance, we use the Fire Net dataset [62]...
Dataset Splits Yes For our ablation study, we use the Image Net1K [16] validation set to curate archive for concepts of interest... To measure segmentation performance in the zero-shot transfer setting, we use the PASCAL-Context [58] validation set for evaluation... We evaluate on standard benchmarks including the Cityscapes [14] validation split... KITTI-STEP [87] validation set... and COCO-Stuff [6] validation split... For unsupervised adaptation with Re Co+ (Sec. 3.2), we train on Re Co pseudo-labels on the Cityscapes training set with 2,975 images, KITTI-STEP training set which contains 5,027 images, and the COCO-Stuff10K subset which has 9,000 images for each respective benchmark.
Hardware Specification Yes Training consists of 20K gradient iterations with a batch size of 8 and takes about 5 hours on a single 24GB NVIDIA P40 GPU.
Software Dependencies No The paper mentions "Py Torch library [66]" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes All training images are resized and center-cropped to 320 320 pixels and data augmentations such as random scaling, cropping, and horizontal flipping are applied with random color jittering and Gaussian blurring. We use the Adam optimiser [43] with an initial learning rate of 5 10 4 and a weight decay of 2 10 4 with the Poly learning rate schedule as in [50, 10]. Training consists of 20K gradient iterations with a batch size of 8 and takes about 5 hours on a single 24GB NVIDIA P40 GPU.