Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension
Authors: Zaiquan Yang, Yuhao LIU, Jiaying Lin, Gerhard Hancke, Rynson Lau
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method outperforms SOTA methods on three common benchmarks. |
| Researcher Affiliation | Academia | Department of Computer Science City University of Hong Kong EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will consider releasing the data and code once the paper is accepted. |
| Open Datasets | Yes | We have conducted experiments on three standard benchmarks: Ref COCO [61], Ref COCO+ [61], and Ref COCOg [39]. They are constructed based on MSCOCO [24]. |
| Dataset Splits | Yes | Table 1: Quantitative comparison using m Io U and Point M metrics. (U)" and (G)" indicate the UMD and Google partitions. ... Val Test A Test B Val Test A Test B Val (G) Val (U) Test (U) |
| Hardware Specification | Yes | We train our framwork for 15 epochs with a batch size of 36 on an RTX4090 GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch' and using 'CLIP' and 'Mistral 7B' models, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We train our framwork for 15 epochs with a batch size of 36... The input images are resized to 320x320. ... The network is optimized using the AdamW optimizer [37] with a weight decay of 1e-2 and an initial learning rate of 5e-5 with polynomial learning rate decay. For the LLM, we utilize the open-source powerful language model Mistral 7B [16] for referring text decomposition. For the proposal generator, we set the number of extracted proposals P = 40 for each image. |