Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension

Authors: Zaiquan Yang, Yuhao LIU, Jiaying Lin, Gerhard Hancke, Rynson Lau

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method outperforms SOTA methods on three common benchmarks.
Researcher Affiliation Academia Department of Computer Science City University of Hong Kong {zaiquyang2-c, yuhliu9-c,jiayinlin5-c}@my.cityu.edu.hk {gp.hancke, Rynson.Lau}@cityu.edu.hk
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No We will consider releasing the data and code once the paper is accepted.
Open Datasets Yes We have conducted experiments on three standard benchmarks: Ref COCO [61], Ref COCO+ [61], and Ref COCOg [39]. They are constructed based on MSCOCO [24].
Dataset Splits Yes Table 1: Quantitative comparison using m Io U and Point M metrics. (U)" and (G)" indicate the UMD and Google partitions. ... Val Test A Test B Val Test A Test B Val (G) Val (U) Test (U)
Hardware Specification Yes We train our framwork for 15 epochs with a batch size of 36 on an RTX4090 GPU.
Software Dependencies No The paper mentions 'PyTorch' and using 'CLIP' and 'Mistral 7B' models, but does not provide specific version numbers for these software components.
Experiment Setup Yes We train our framwork for 15 epochs with a batch size of 36... The input images are resized to 320x320. ... The network is optimized using the AdamW optimizer [37] with a weight decay of 1e-2 and an initial learning rate of 5e-5 with polynomial learning rate decay. For the LLM, we utilize the open-source powerful language model Mistral 7B [16] for referring text decomposition. For the proposal generator, we set the number of extracted proposals P = 40 for each image.