Parallel Vertex Diffusion for Unified Visual Grounding

Authors: Zesen Cheng, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments, Experimental Setup, Main Results, Quantitative Analysis, Qualitative Analysis, Table 1: Main results on classical REC datasets., Table 2: Main results on classical RIS datasets., Table 3: Diagnostic Experiments., Table 4: Efficiency Comparison between SVG and PVD.
Researcher Affiliation Academia 1 School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2 AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China 3 Peng Cheng Laboratory, Shenzhen, China 4 Tsinghua University, Beijing, China
Pseudocode No No explicit pseudocode or algorithm blocks labeled as such are present in the paper.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll ar, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, 740 755. Springer.
Dataset Splits Yes Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg., and tables show 'val' and 'test' columns, implying the use of standard splits for these datasets.
Hardware Specification Yes We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as 'Python 3.8, PyTorch 1.9,' or specific solver versions.
Experiment Setup Yes The maximum sentence length n is set to 15 for Ref COCO, Ref COCO+, and 20 for Ref COCOg. The images are resized to 640 x 640. The training sampling number of mask vertexes N is set by default to 36. During inference phrase, T is set to 4 because DDIM step is adopted for accelerating sampling speed. Adam W (Loshchilov and Hutter 2019) is adopted as our optimizer, and the learning rate and weight decay are set to 5e-4 and 5e-2. The learning rate is scaled by a decay factor of 0.1 at the 60th step. We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64.