Self-supervised Semantic Segmentation Grounded in Visual Concepts

Authors: Wenbin He, William Surmeier, Arvind Kumar Shekar, Liang Gou, Liu Ren

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the learned pixel embeddings and visual concepts on three datasets, including PASCAL VOC 2012, COCO 2017, and DAVIS 2017. Our results show that the proposed method gains consistent and substantial improvements over recent unsupervised semantic segmentation approaches
Researcher Affiliation Industry 1Robert Bosch Research and Technology Center North America 2Robert Bosch Gmb H
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We mainly experiment on the Pascal VOC 2012 dataset, which contains 20 object classes and one background class. Following the prior work [Hwang et al., 2019], we train networks on the train aug set with 10,582 images and test on the val set with 1,449 images. We also perform experiments on COCO 2017 and DAVIS 2017 to evaluate the generalizability of the learned pixel embeddings.
Dataset Splits Yes Following the prior work [Hwang et al., 2019], we train networks on the train aug set with 10,582 images and test on the val set with 1,449 images.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For self-supervised pre-training, the hyperparameters are set as follows. The embedding dimension is set to 32, and the concentration constant κ is set to 10. For VQ, we use a dictionary of size 512 and set the commitment constant β to 0.5. The weights λs, λv, and λo of each loss term are set to 1, 2, and 1, respectively. We train the network on the train aug set of Pascal VOC 2012 for 5k iterations with a batch size of 8. We set the initial learning rate to 0.001 and decay it with a poly learning rate policy.