reproducibilityindex.ai

Parallel Vertex Diffusion for Unified Visual Grounding

Authors: Zesen Cheng, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments, Experimental Setup, Main Results, Quantitative Analysis, Qualitative Analysis, Table 1: Main results on classical REC datasets., Table 2: Main results on classical RIS datasets., Table 3: Diagnostic Experiments., Table 4: Efficiency Comparison between SVG and PVD.
Researcher Affiliation	Academia	1 School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2 AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China 3 Peng Cheng Laboratory, Shenzhen, China 4 Tsinghua University, Beijing, China
Pseudocode	No	No explicit pseudocode or algorithm blocks labeled as such are present in the paper.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Doll ar, P.; and Zitnick, C. L. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, 740 755. Springer.
Dataset Splits	Yes	Our model is evaluated on three standard referring image segmentation datasets: Ref COCO, Ref COCO+, and Ref COCOg., and tables show 'val' and 'test' columns, implying the use of standard splits for these datasets.
Hardware Specification	Yes	We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as 'Python 3.8, PyTorch 1.9,' or specific solver versions.
Experiment Setup	Yes	The maximum sentence length n is set to 15 for Ref COCO, Ref COCO+, and 20 for Ref COCOg. The images are resized to 640 x 640. The training sampling number of mask vertexes N is set by default to 36. During inference phrase, T is set to 4 because DDIM step is adopted for accelerating sampling speed. Adam W (Loshchilov and Hutter 2019) is adopted as our optimizer, and the learning rate and weight decay are set to 5e-4 and 5e-2. The learning rate is scaled by a decay factor of 0.1 at the 60th step. We train our models for 100 epochs on 4 NVIDIA V100 with a batch size of 64.