PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

Authors: Yuliang Zou, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration Yuliang Zou1 Zizhao Zhang2 Han Zhang3 Chun-Liang Li2 Xiao Bian2 Jia-Bin Huang1 Tomas Pfister2 1Virginia Tech 2Google Cloud AI 3Google Brain
Pseudocode No The paper describes the method and uses diagrams (e.g., Figure 6) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The source code is available at https://github.com/googleinterns/wss.
Open Datasets Yes To evaluate the proposed method, we conduct the main experiments and ablation studies on the PASCAL VOC 2012 dataset (VOC12) (Everingham et al., 2015), which contains 21 classes including background. ...we also conduct experiments on the COCO dataset (Lin et al., 2014).
Dataset Splits Yes The standard VOC12 dataset has 1,449 images as the training set and 1,456 images as the validation set. We randomly subsample 1/2, 1/4, 1/8, and 1/16 of images in the standard training set to construct the pixel-level labeled data. ... The COCO dataset has 118,287 images as the training set, and 5,000 images as the validation set. We randomly subsample smaller ratios, 1/32, 1/64, 1/128, 1/256, 1/512, of images from the training set to construct the pixel-level labeled data.
Hardware Specification No The paper mentions 'using 16 GPUs' but does not provide specific details on the GPU models, CPU, or any other hardware specifications used for experiments.
Software Dependencies No The paper mentions implementing the method on top of the 'Deep Lab codebase' (TensorFlow), but it does not specify explicit version numbers for this or any other software dependencies like Python, TensorFlow, or CUDA.
Experiment Setup Yes Unless specified, we adopt the Deep Labv3+ model with Xception-65 (Chollet, 2017) as the feature backbone... We train our model following the default hyper-parameters (e.g., an initial learning rate of 0.007 with a polynomial learning rate decay schedule, a crop size of 513x513, and an encoder output stride of 16), using 16 GPUs 4. We use a batch size of 4 for each GPU for pixel-level labeled data, and 4 for unlabeled/image-level labeled data. For VOC12, we train the model for 30,000 iterations. For COCO, we train the model for 200,000 iterations. We set γ = 0.5 and T = 0.5 unless specified.