Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Authors: Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H.S. Torr
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the challenging datasets of Ref COCO, Ref COCO+, and G-Ref demonstrate its advantage with respect to the state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1University of Oxford, 2Shanghai AI Laboratory, 3Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, 4The University of Hong Kong |
| Pseudocode | No | The paper includes schematic illustrations of the proposed method (e.g., Figure 3), but it does not contain a formal pseudocode block or algorithm section. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing their source code, nor does it include a link to a code repository for their method. |
| Open Datasets | Yes | We evaluate our proposed method on the datasets of Ref COCO (Yu et al. 2016), Ref COCO+ (Yu et al. 2016), and G-Ref (Mao et al. 2016; Nagaraja, Morariu, and Davis 2016). |
| Dataset Splits | Yes | For each dataset, the model is trained on the training set for 40 epochs with batch size 32... Images are resized to 480 × 480 resolution... On the validation, test A, and test B subsets of Ref COCO... |
| Hardware Specification | Yes | We measure the inference time by averaging over 500 forward passes using batch size 1 at 480 × 480 input resolution on an NVIDIA Quadro RTX 8000. |
| Software Dependencies | No | The paper mentions key software components like 'BERT-base model from (Devlin et al. 2019)' and 'Swin-B model from (Liu et al. 2021b)' and 'Hugging Face (Wolf et al. 2020)', but it does not specify version numbers for general software libraries or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We adopt an Adam W (Loshchilov and Hutter 2019) optimizer with initial learning rate 5e-5 and weight decay 1e2, and apply the poly learning rate scheduler (Chen et al. 2018). The default number of iterations (n in Sec. 3) is 3, for which the loss weights, λ1, λ2, and λ3, are 0.15, 0.15, 0.7, respectively. For each dataset, the model is trained on the training set for 40 epochs with batch size 32, where each object is sampled exactly once in an epoch (with one of its text annotations randomly sampled). Images are resized to 480 × 480 resolution and sentence lengths are capped at 20. |