Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning

Authors: Tsung-Wei Ke, Jyh-Jing Hwang, Stella Yu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on Pascal VOC and Dense Pose demonstrate consistent gains over the state-of-the-art (SOTA), and the gain is substantial especially for the sparsest keypoint supervision.
Researcher Affiliation Academia Tsung-Wei Ke Jyh-Jing Hwang Stella X. Yu UC Berkeley / ICSI {twke,jyh,stellayu}@berkeley.edu
Pseudocode Yes Algorithm 1: Inference procedure for semantic segmentation using scribble / point / bounding box annotations. Algorithm 2: Inference procedure for semantic segmentation using image-level tags.
Open Source Code Yes Our code is publicly available at https://github.com/twke18/SPML.
Open Datasets Yes Pascal VOC 2012 Everingham et al. (2010) includes 20 object categories and one background class. Following Chen et al. (2017), we use the augmented training set with 10,582 images and validation set with 1,449 images. Dense Pose (Alp G uler et al., 2018) is a human pose parsing dataset based on MSCOCO (Lin et al., 2014).
Dataset Splits Yes Following Chen et al. (2017), we use the augmented training set with 10,582 images and validation set with 1,449 images.
Hardware Specification No For conducting experiments, we take advantage of XSEDE infrastructure (Towns et al., 2014) that includes Bridges resources (Nystrom et al., 2015).
Software Dependencies No The paper mentions using Deep Lab, PSPNet, ResNet101, and ImageNet as backbone/pre-training models but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes On Pascal VOC dataset, we set batchsize to 12 and 16 for scribble / point and image tag / bounding box annotations. On Dense Pose dataset, batchsize is set to 16. For all the experiments, we train our models with 512 512 cropsize. Following Chen et al. (2017), we adopt poly learning rate policy by multiplying base learning rate by 1 ( iter max iter)0.9. We set initial learning rate to 0.003, momentum to 0.9. For the hyper-parameters in Seg Sort framework, we use unit-length normalized embedding of dimension 64 and 32 on VOC and Dense Pose, respectively. We iterate K-Means clustering for 10 iterations and generate 36 and 144 clusters on VOC and Dense Pose dataset. We set the concentration parameter κ to different values for semantic annotation, low-level image similarity, semantic co-occurrence and feature affinity, respectively. Moreover, λI, λO and λA are set to different values according to different types of annotations and datasets. λC is set to 1 among all the experiments. The detailed hyper-parameter settings are summarized in table 5. We train for 30k and 45k iterations on VOC and Dense Pose dataset for all the experiments.