Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension
Authors: Zaiquan Yang, Yuhao LIU, Jiaying Lin, Gerhard Hancke, Rynson Lau
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method outperforms SOTA methods on three common benchmarks. |
| Researcher Affiliation | Academia | Department of Computer Science City University of Hong Kong {zaiquyang2-c, yuhliu9-c,jiayinlin5-c}@my.cityu.edu.hk {gp.hancke, Rynson.Lau}@cityu.edu.hk |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will consider releasing the data and code once the paper is accepted. |
| Open Datasets | Yes | We have conducted experiments on three standard benchmarks: Ref COCO [61], Ref COCO+ [61], and Ref COCOg [39]. They are constructed based on MSCOCO [24]. |
| Dataset Splits | Yes | Table 1: Quantitative comparison using m Io U and Point M metrics. (U)" and (G)" indicate the UMD and Google partitions. ... Val Test A Test B Val Test A Test B Val (G) Val (U) Test (U) |
| Hardware Specification | Yes | We train our framwork for 15 epochs with a batch size of 36 on an RTX4090 GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch' and using 'CLIP' and 'Mistral 7B' models, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We train our framwork for 15 epochs with a batch size of 36... The input images are resized to 320x320. ... The network is optimized using the AdamW optimizer [37] with a weight decay of 1e-2 and an initial learning rate of 5e-5 with polynomial learning rate decay. For the LLM, we utilize the open-source powerful language model Mistral 7B [16] for referring text decomposition. For the proposal generator, we set the number of extracted proposals P = 40 for each image. |