Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation

Authors: Tianyu Guo, Haowei Wang, Yiwei Ma, Jiayi Ji, Xiaoshuai Sun

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on PNG benchmark datasets reveal that our approach achieves state-of-the-art performance, significantly outperforming existing methods by a considerable margin and yielding a 3.9-point improvement in overall metrics.
Researcher Affiliation Academia Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China {guotianyu, wanghaowei, yiweima}@stu.xmu.edu.cn, jjyxmu@gmail.com, xssun@xmu.edu.cn
Pseudocode No The paper includes mathematical equations and descriptive text, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Our codes and results are available at our project webpage: https://github.com/Tianyu Go GO/XPNG.
Open Datasets Yes We trained and evaluated our model on the Panoptic Narrative Grounding (PNG) dataset (Gonz alez et al. 2021)
Dataset Splits No In total, the PNG dataset comprises 133,103 training images and 8,380 test images, accompanied by 875,073 and 56,531 segmentation annotations, respectively. No specific validation set size or split is provided.
Hardware Specification Yes All experiments are conducted on an A100 GPU with a batch size of 11.
Software Dependencies No The paper mentions using FPN, ResNet101, and BERT models, but does not provide specific version numbers for these or any other software dependencies or libraries.
Experiment Setup Yes Images are resized so that the short side is 800 pixels while maintaining the aspect ratio, and the long side is 1333 pixels. For language input, ... The maximum token length is set to 230. We employ the Adam optimizer with an initial learning rate of 1e 4, which is halved every two epochs after the tenth epoch. The learning rate for BERT is set to 1e 5. The number of iteration update stages is set to 3. ... with a batch size of 11.