Parallel Vertex Diffusion for Unified Visual Grounding
Authors: Zesen Cheng, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments, Experimental Setup, Main Results, Quantitative Analysis, Qualitative Analysis, Table 1: Main results on classical REC datasets., Table 2: Main results on classical RIS datasets., Table 3: Diagnostic Experiments., Table 4: Efficiency Comparison between SVG and PVD. |
| Researcher Affiliation | Academia | 1 School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2 AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China 3 Peng Cheng Laboratory, Shenzhen, China 4 Tsinghua University, Beijing, China |
| Pseudocode | No | No explicit pseudocode or algorithm blocks labeled as such are present in the paper. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll ar, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, 740 755. Springer. |
| Dataset Splits | Yes | Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg., and tables show 'val' and 'test' columns, implying the use of standard splits for these datasets. |
| Hardware Specification | Yes | We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as 'Python 3.8, PyTorch 1.9,' or specific solver versions. |
| Experiment Setup | Yes | The maximum sentence length n is set to 15 for Ref COCO, Ref COCO+, and 20 for Ref COCOg. The images are resized to 640 x 640. The training sampling number of mask vertexes N is set by default to 36. During inference phrase, T is set to 4 because DDIM step is adopted for accelerating sampling speed. Adam W (Loshchilov and Hutter 2019) is adopted as our optimizer, and the learning rate and weight decay are set to 5e-4 and 5e-2. The learning rate is scaled by a decay factor of 0.1 at the 60th step. We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64. |