CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

Authors: Zicheng Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Wei Ke

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on popular datasets (e.g., Ref COCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e.g., about 2% o Io U increase on the validation and testing set of Ref COCO. To validate the effectiveness of our method, we conduct extensive experiments on popular datasets (Ref COCO [18] and G-ref [32]).
Researcher Affiliation Collaboration Zicheng Zhang1 Yi Zhu2 Jianzhuang Liu2 Xiaodan Liang3 Ke Wei1 1Xi an Jiaotong University 2Noah s Ark Lab, Huawei Technologies 3Sun Yat-sen University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be available at https://gitee.com/mindspore/models/tree/master/ research/cv/Coup Align.
Open Datasets Yes We evaluate our method on four widely-used datasets: Ref COCO [46], Ref COCO+[46], G-Ref [32] and Refer It [18]. Ref COCO consists of 142,209 referring expressions for 50,000 objects in 19,994 images and Ref COCO+ has 141,564 referring expressions for 49,856 objects in 19,992 images. G-Ref was collected on Amazon Mechanical Turk in a non-interactive setting which consists of 85,474 referring expressions for 54,822 objects in 26,711 images. Refer It has 130,525 referring expressions in 19,894 images which are collected from IAPR TC-12 [9]. If your work uses existing assets, did you cite the creators? [Yes] Please see the Experiment and Reference sections. ... The data and models used in our work are publicly released.
Dataset Splits Yes Extensive experiments on popular datasets (e.g., Ref COCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e.g., about 2% o Io U increase on the validation and testing set of Ref COCO. We evaluate our method on four widely-used datasets: Ref COCO [46], Ref COCO+[46], G-Ref [32] and Refer It [18]. In Tab. 1, we compare Coup Align with state-of-the-art (SOTA) RIS methods by o Io U metric. ... Ref COCO val test A test B...
Hardware Specification Yes All experiments are conducted on an NVIDIA 3090 GPU.
Software Dependencies No The paper mentions software components like "Mind Spore Lite tool [1]" and "BERTBASE [6] with 12 layers" but does not provide specific version numbers for these tools or other key libraries.
Experiment Setup Yes The number of queries of the mask generator N is 100. The rest of the weights in our model are randomly initialized. We adopt Adam W [28] optimizer with weight decay 0.01. We adopt the polynomial learning rate decay schedule and set the initial learning rate to 3e-5, the end learning rate to 1.5e-5 and the max decay epoch to 25. Our model is trained for 50 epochs with batch size 16. The images are resized to 448 448 without specific data augmentation and the maximum sentence length is set to 30. The weight of the auxiliary loss λ is set to 0.1.