SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three RES benchmarks Ref COCO, Ref COCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our Semi RES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on Ref COCO val set. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 2Youtu Lab, Tencent, Shanghai, China. |
| Pseudocode | Yes | Algorithm 1 Pseudo code for our proposed Semi RES |
| Open Source Code | Yes | The project code is available at https://github.com/ nini0919/Semi RES. |
| Open Datasets | Yes | We verify the effectiveness of our proposed method on three standard RES benchmark datasets, Ref COCO (Yu et al., 2016), Ref COCO+ (Yu et al., 2016), and G-Ref (Mao et al., 2016; Nagaraja et al., 2016). |
| Dataset Splits | Yes | Ref COCO & Ref COCO+ contains 19,994, 19,992 images, with 50,000, 49,856 annotated objects and 142,209, 141,564 annotated expressions, respectively. Ref COCO and Ref COCO+ are split into four parts, i.e., train, val, test A and test B. |
| Hardware Specification | Yes | We implement our Semi RES model in Py Torch (Paszke et al., 2019), training it on 4 RTX3090 GPUs with 3 labeled and 3 unlabeled samples per GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch, BERT, Swin Transformer, and SAM but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | Optimization is done using the Adam W optimizer, with an initial learning rate of 5 10 5 and weight decay of 10 2. Data augmentation includes Random Color Jitter and Random Gaussian Blur. We set the EMA rate at 0.996 and use pre-trained weights of the Vi T-Huge version for SAM in generating multi-scale masks. |