SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
Authors: Shuyi Ouyang, Hongyi Wang, Shiao Xie, Ziwei Niu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have evaluated our method on three benchmark datasets. The experimental results demonstrate that SLVi T surpasses state-of-the-art methods with lower computational cost. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Ritsumeikan University 3Zhejiang Lab |
| Pseudocode | No | The paper describes the model architecture and processes using figures and mathematical equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is publicly available at: https://github.com/NaturalKnight/SLViT. |
| Open Datasets | Yes | We perform experiments on three widely used benchmark datasets for referring image segmentation, including Ref COCO [Yu et al., 2016], Ref COCO+ [Yu et al., 2016], and G-Ref [Mao et al., 2016; Nagaraja et al., 2016]. |
| Dataset Splits | Yes | Method Language Ref COCO Ref COCO+ G-Ref Model val test A test B val test A test B val(U) test(U) val(G) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch and Hugging Face's Transformer library, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Following, we use Adam W optimizer with weight decay 0.01. The learning rate is initialed as 3e-5 and scheduled by polynomial learning rate decay with a power of 0.9. All the models are trained for 60 epochs with a batch size of 16. Each reference has 2-3 sentences on average, and we randomly sample one referring expression per object in a epoch. Image size is adjusted to 480 480 without data augmentation. |