WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

Authors: Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With our Wi Co, several prominent top-down and bottom-up combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method. 4 Experiments Our model is evaluated on three standard referring image segmentation datasets: Ref COCO [Yu et al., 2016], Ref COCO+ [Yu et al., 2016] and Ref COCOg [Mao et al., 2016].
Researcher Affiliation Academia 1 School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2 AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China 3 Peng Cheng Laboratory, Shenzhen, China 4 Tsinghua University, Beijing, China {cyanlaser, jp21, kehanli}@stu.pku.edu.cn, lisiheng21@mails.tsinghua.edu.cn {xyji, liuchang2022}@tsinghua.edu.cn, {lihao1984, jiechen2019}@pku.edu.cn
Pseudocode No None found.
Open Source Code No None found.
Open Datasets Yes Our model is evaluated on three standard referring image segmentation datasets: Ref COCO [Yu et al., 2016], Ref COCO+ [Yu et al., 2016] and Ref COCOg [Mao et al., 2016].
Dataset Splits No Our model is evaluated on three standard referring image segmentation datasets: Ref COCO [Yu et al., 2016], Ref COCO+ [Yu et al., 2016] and Ref COCOg [Mao et al., 2016]. The data preprocessing operations are in line with the original implementation of those selected methods.
Hardware Specification Yes We train our models for 5,000 iterations on an NVIDIA V100 with a batch size of 24.
Software Dependencies No Adam W [Loshchilov and Hutter, 2017] is adopted as our optimizer, and the learning rate and weight decay are set to 1e-5 and 5e-2.
Experiment Setup Yes Adam W [Loshchilov and Hutter, 2017] is adopted as our optimizer, and the learning rate and weight decay are set to 1e-5 and 5e-2. We train our models for 5,000 iterations on an NVIDIA V100 with a batch size of 24. To binarize the probability map and get segmentation results, the threshold τ is set to 0.35 to calibrate previous works [Ding et al., 2021].