Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant

Authors: Ying Jin, Jiaqi Wang, Dahua Lin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiment results on benchmark datasets validate that our method shows competitive performance against previous methods. and Extensive experiments have validated that our method shows competitive performance on mainstream benchmarks, proving that it can make better utilization of unlabeled data.
Researcher Affiliation Collaboration 1CUHK-Sense Time Joint Lab, The Chinese University of Hong Kong 2Shanghai AI Laboratory {jy021,dhlin}@ie.cuhk.edu.hk, wjqdev@gmail.com
Pseudocode Yes Algorithm 1: Gentle Teaching Assistant for Semi-Supervised Semantic Segmentation (GTA-Seg).
Open Source Code Yes Code is available at https://github.com/Jin-Ying/GTA-Seg.
Open Datasets Yes We evaluate our method on 1) PASCAL VOC 2012 [11]: a widely-used benchmark dataset for semantic segmentation... 2) Cityscapes [8], a urban scene dataset...
Dataset Splits Yes We take 92, 183, 366, 732, and 1464 images from the 1464 labeled images in the original training set, and 662, 1323 and 2645 images from the 10582 labeled training images in the augmented training set. and We sample 100, 186, 372, 744 images from the 2975 labeled images in the training set.
Hardware Specification No The paper mentions using '4 GPUs' and '8 GPUs' for training but does not specify the model or type of GPUs, nor any other hardware components like CPUs.
Software Dependencies No The paper mentions optimizers and network architectures but does not specify software dependencies like Python, PyTorch, or CUDA with their version numbers.
Experiment Setup Yes We take SGD as the optimizer, with an initial learning rate of 0.001 and a weight decay of 0.0001 for PASCAL VOC. The learning rate of the decoder is 10 times of the network backbone. On Cityscapes, the initial learning rate is 0.01 and the weight decay is 0.0005. Poly scheduling is applied to the learning rate with lr = lrinit (1 - t/T)^0.9... We set the trade-off between the loss of labeled and unlabeled data µ = 1.0, the hyper-parameter ε = 1.0 in our re-weighting strategy and the EMA hyper-parameter α = 0.99 in all of our experiments. At the beginning of training, we train all three components... for one epoch as a warm-up... For pseudo labels, we abandon the 20% data with lower confidence.