Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Authors: Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H.S. Torr

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the challenging datasets of Ref COCO, Ref COCO+, and G-Ref demonstrate its advantage with respect to the state-of-the-art methods.
Researcher Affiliation Collaboration 1University of Oxford, 2Shanghai AI Laboratory, 3Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, 4The University of Hong Kong
Pseudocode No The paper includes schematic illustrations of the proposed method (e.g., Figure 3), but it does not contain a formal pseudocode block or algorithm section.
Open Source Code No The paper does not provide any explicit statement about releasing their source code, nor does it include a link to a code repository for their method.
Open Datasets Yes We evaluate our proposed method on the datasets of Ref COCO (Yu et al. 2016), Ref COCO+ (Yu et al. 2016), and G-Ref (Mao et al. 2016; Nagaraja, Morariu, and Davis 2016).
Dataset Splits Yes For each dataset, the model is trained on the training set for 40 epochs with batch size 32... Images are resized to 480 × 480 resolution... On the validation, test A, and test B subsets of Ref COCO...
Hardware Specification Yes We measure the inference time by averaging over 500 forward passes using batch size 1 at 480 × 480 input resolution on an NVIDIA Quadro RTX 8000.
Software Dependencies No The paper mentions key software components like 'BERT-base model from (Devlin et al. 2019)' and 'Swin-B model from (Liu et al. 2021b)' and 'Hugging Face (Wolf et al. 2020)', but it does not specify version numbers for general software libraries or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We adopt an Adam W (Loshchilov and Hutter 2019) optimizer with initial learning rate 5e-5 and weight decay 1e2, and apply the poly learning rate scheduler (Chen et al. 2018). The default number of iterations (n in Sec. 3) is 3, for which the loss weights, λ1, λ2, and λ3, are 0.15, 0.15, 0.7, respectively. For each dataset, the model is trained on the training set for 40 epochs with batch size 32, where each object is sampled exactly once in an epoch (with one of its text annotations randomly sampled). Images are resized to 480 × 480 resolution and sentence lengths are capped at 20.