reproducibilityindex.ai

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Authors: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three RES benchmarks Ref COCO, Ref COCO+, and G-Ref reveal its superior performance compared to fully supervised methods. Remarkably, with only 1% labeled data, our Semi RES outperforms the supervised baseline by a large margin, e.g. +18.64% gains on Ref COCO val set.
Researcher Affiliation	Collaboration	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 2Youtu Lab, Tencent, Shanghai, China.
Pseudocode	Yes	Algorithm 1 Pseudo code for our proposed Semi RES
Open Source Code	Yes	The project code is available at https://github.com/ nini0919/Semi RES.
Open Datasets	Yes	We verify the effectiveness of our proposed method on three standard RES benchmark datasets, Ref COCO (Yu et al., 2016), Ref COCO+ (Yu et al., 2016), and G-Ref (Mao et al., 2016; Nagaraja et al., 2016).
Dataset Splits	Yes	Ref COCO & Ref COCO+ contains 19,994, 19,992 images, with 50,000, 49,856 annotated objects and 142,209, 141,564 annotated expressions, respectively. Ref COCO and Ref COCO+ are split into four parts, i.e., train, val, test A and test B.
Hardware Specification	Yes	We implement our Semi RES model in Py Torch (Paszke et al., 2019), training it on 4 RTX3090 GPUs with 3 labeled and 3 unlabeled samples per GPU.
Software Dependencies	No	The paper mentions software like PyTorch, BERT, Swin Transformer, and SAM but does not provide specific version numbers for these components.
Experiment Setup	Yes	Optimization is done using the Adam W optimizer, with an initial learning rate of 5 10 5 and weight decay of 10 2. Data augmentation includes Random Color Jitter and Random Gaussian Blur. We set the EMA rate at 0.996 and use pre-trained weights of the Vi T-Huge version for SAM in generating multi-scale masks.