Label-Efficient Few-Shot Semantic Segmentation with Unsupervised Meta-Training

Authors: Jianwu Li, Kaiyue Shi, Guo-Sen Xie, Xiaofeng Liu, Jian Zhang, Tianfei Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have been conducted on two standard benchmarks, i.e., PASCAL-5i and COCO-20i, and the results show that our method produces impressive performance without any annotations, and is comparable to fully supervised competitors even using only 20% of the annotations.
Researcher Affiliation Academia 1 Beijing Institute of Technology 2 Nanjing University of Science and Technology 3 Hohai University 4 University of Technology Sydney
Pseudocode No The paper describes the method in text but does not include structured pseudocode or an algorithm block.
Open Source Code Yes Our code is available at: https://github.com/SSSKYue/UMTFSS.
Open Datasets Yes For Dtrain and Dtest, we follow conventions to run FSS testing on two datasets, i.e., PASCAL-5i (Shaban et al. 2017) and COCO-20i (Lin et al. 2014) for few-shot segmentation. PASCAL-5i is built from PASCAL VOC 2012 (Everingham et al. 2010) and SDS (Hariharan et al. 2014). [...] COCO-20i is built from MS COCO (Lin et al. 2014). [...] For Utrain, we use all training images in COCO-20i (Lin et al. 2014), including 82,010 images in total. Note that for ablation study, we use all images in PASCAL-5i instead which has 5,953 images and thus makes it easier to run a large number of ablative experiments. [...] we use Image Net (Russakovsky et al. 2015)-pretrained Res Net (He et al. 2016) as the backbone network
Dataset Splits No The paper describes the use of support and query sets within an episodic meta-training framework, where support sets act as exemplars for learning, and the model is meta-tested on Dtest. However, it does not explicitly define a separate 'validation' dataset split with specific percentages or counts for hyperparameter tuning in the conventional sense.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used to run the experiments. It only mentions using 'Image Net-pretrained ResNet as the backbone network'.
Software Dependencies No The paper mentions optimizers (SGD, AdamW) and model architectures (ResNet, Transformer) but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes For meta-training, we follow conventions (Zhang et al. 2021; Tian et al. 2020) to set the training hyper-parameters. For fairness, we use Image Net (Russakovsky et al. 2015)-pretrained Res Net (He et al. 2016) as the backbone network and its parameters (including Batch Norms) are frozen. For the parameters except those in Transformer layers, we use SGD as the optimizer with base learning rate 1e-2, momentum 0.9, weight decay 1e-4. The learning rate is scheduled by the polynomial annealing policy (Chen et al. 2017). For the Transformer block, we set the number of heads for MHA to 8 and d to 256, and use Dropout with the probability 0.1. For proto SHA, we set the Kfg to 50 and Kbg to 100. All layers in Transformer block are repeated for 2 times and the parameters are optimized with Adam W (Loshchilov and Hutter 2017) with learning rate 1e-4 and weight decay 1e-2. For data augmentation, we use random rotation from 10 to 10 . We train 20 epochs on COCO-20i as Utrain with a batch size of 32 and crop size 473 473. For automatic task construction, we set the number of cluster centroids N to 50 for COCO-20i. For supervised meta-training on specified tasks, we finetune our unsupervised-trained model for 100 epochs on PASCAL-5i dataset and 50 epochs on COCO-20i with batch size of 4 and 16, initial learning rate of 1e-4 and 2.5e-3, respectively.