Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations

Authors: Andrii Zadaianchuk, Matthaeus Kleindessner, Yi Zhu, Francesco Locatello, Thomas Brox

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results on PASCAL VOC that go far beyond the current state of the art (50.0 m Io U) , and we report for the first time results on MS COCO for the whole set of 81 classes: our method discovers 34 categories with more than 20% Io U, while obtaining an average Io U of 19.6 for all 81 categories. 4 EXPERIMENTS
Researcher Affiliation Collaboration Andrii Zadaianchuk1,3 , Matthaeus Kleindessner2, Yi Zhu2, Francesco Locatello2, Thomas Brox2,4 1Max-Planck Institute for Intelligent Systems, T ubingen, Germany, 2Amazon Web Services, 3Department of Computer Science, ETH Z urich, 4University of Freiburg
Pseudocode Yes Algorithm 1: Object Categories Discovery for Unsupervised Pseudo-Masks Estimation Algorithm 2: Self-training with Noisy Pseudo-Masks
Open Source Code Yes The COMUS implementation code is located here: https://github.com/zadaianchuk/comus.
Open Datasets Yes PASCAL VOC: The PASCAL Visual Object Classes (VOC) project (Everingham et al., 2015)... All datasets and detailed descriptions are available on the PASCAL VOC homepage (http://host.robots.ox.ac.uk/pascal/VOC/index.html). MS COCO: We also apply our method to the Microsoft (MS) COCO dataset (Lin et al., 2014). The dataset and informations are available on https://cocodataset.org/#home. Image Net: For feature extraction, we use vision transformers pretrained with the self-supervised (no labels!) DINO method (Caron et al., 2021) on Image Net (Deng et al., 2009). The pretrained checkpoint can be found on https://github.com/facebookresearch/dino. Informations about Image Net are provided on https://image-net.org/. MSRA-B: For computing saliency mask, we use Bas Net (Qin et al., 2019) pretrained on pseudo-labels generated with the unsupervised Deep USPS (Nguyen et al., 2019) outputs on the MSRA-B dataset (Wang et al., 2017). The pretrained checkpoint can be found on https://github.com/wvangansbeke/Unsupervised-Semantic Segmentation/tree/main/saliency. The dataset and informations about it are available on https://mmcheng.net/msra10k/.
Dataset Splits Yes Evaluation Setting We tested the proposed approach on two semantic object segmentation datasets, PASCAL VOC (Everingham et al., 2012) and MS COCO (Lin et al., 2014). ... we used the ground truth segmentation masks only for testing but not for any training. ... We computed on the PASCAL 2012 train set. ... PASCAL 2012 trainaug set (10582 images) is an extension of the original train set (1464 images).
Hardware Specification Yes In particular, DINO with Vision Transformers training takes 3 days on two 8-GPU servers... Deep USPS could be trained 30 hours of computation time on old four Geforce Titan X (Nguyen et al., 2019), while Bas Net could be trained with four GTX 1080ti GPU (with 11GB memory) in around 30 hours... for convenience we perform all the experiments on one node with 4 NVIDIA T4 GPUs
Software Dependencies No The paper mentions software like 'sklearn', 'Deep Labv3', 'DINO', and 'Bas Net', and provides links to their GitHub repositories or official pages. However, it does not explicitly list specific version numbers for these key software components or libraries.
Experiment Setup Yes Table 15: Spectral clustering parameters for COMUS on PASCAL VOC and MS COCO datasets. Hyper-parameter PASCAL VOC MS COCO Number of clusters 20 80 Number of components 20 80 Affinity nearest neightbors nearest neightbors Number of neighbors 30 30 and Table 16: Self-training parameters for COMUS on PASCAL VOC and MS COCO datasets. Hyper-parameter PASCAL VOC MS COCO Optimizer Adam with default settings Adam with default settings Learning rate 0.00006 0.00006 Batch size 56 56 Input size 512 256 Crop scales [0.5, 2] [0.2, 1.0] Number of iterations 2 1 Number of epochs. Iteration 1 10 1 Number of epochs. Iteration 2 5 -