SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three RES benchmarks Ref COCO, Ref COCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our Semi RES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on Ref COCO val set.
Researcher Affiliation Collaboration 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 2Youtu Lab, Tencent, Shanghai, China.
Pseudocode Yes Algorithm 1 Pseudo code for our proposed Semi RES
Open Source Code Yes The project code is available at https://github.com/ nini0919/Semi RES.
Open Datasets Yes We verify the effectiveness of our proposed method on three standard RES benchmark datasets, Ref COCO (Yu et al., 2016), Ref COCO+ (Yu et al., 2016), and G-Ref (Mao et al., 2016; Nagaraja et al., 2016).
Dataset Splits Yes Ref COCO & Ref COCO+ contains 19,994, 19,992 images, with 50,000, 49,856 annotated objects and 142,209, 141,564 annotated expressions, respectively. Ref COCO and Ref COCO+ are split into four parts, i.e., train, val, test A and test B.
Hardware Specification Yes We implement our Semi RES model in Py Torch (Paszke et al., 2019), training it on 4 RTX3090 GPUs with 3 labeled and 3 unlabeled samples per GPU.
Software Dependencies No The paper mentions software like PyTorch, BERT, Swin Transformer, and SAM but does not provide specific version numbers for these components.
Experiment Setup Yes Optimization is done using the Adam W optimizer, with an initial learning rate of 5 10 5 and weight decay of 10 2. Data augmentation includes Random Color Jitter and Random Gaussian Blur. We set the EMA rate at 0.996 and use pre-trained weights of the Vi T-Huge version for SAM in generating multi-scale masks.