Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
Authors: Tianyu Guo, Haowei Wang, Yiwei Ma, Jiayi Ji, Xiaoshuai Sun
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on PNG benchmark datasets reveal that our approach achieves state-of-the-art performance, significantly outperforming existing methods by a considerable margin and yielding a 3.9-point improvement in overall metrics. |
| Researcher Affiliation | Academia | Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China {guotianyu, wanghaowei, yiweima}@stu.xmu.edu.cn, jjyxmu@gmail.com, xssun@xmu.edu.cn |
| Pseudocode | No | The paper includes mathematical equations and descriptive text, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Our codes and results are available at our project webpage: https://github.com/Tianyu Go GO/XPNG. |
| Open Datasets | Yes | We trained and evaluated our model on the Panoptic Narrative Grounding (PNG) dataset (Gonz alez et al. 2021) |
| Dataset Splits | No | In total, the PNG dataset comprises 133,103 training images and 8,380 test images, accompanied by 875,073 and 56,531 segmentation annotations, respectively. No specific validation set size or split is provided. |
| Hardware Specification | Yes | All experiments are conducted on an A100 GPU with a batch size of 11. |
| Software Dependencies | No | The paper mentions using FPN, ResNet101, and BERT models, but does not provide specific version numbers for these or any other software dependencies or libraries. |
| Experiment Setup | Yes | Images are resized so that the short side is 800 pixels while maintaining the aspect ratio, and the long side is 1333 pixels. For language input, ... The maximum token length is set to 230. We employ the Adam optimizer with an initial learning rate of 1e 4, which is halved every two epochs after the tenth epoch. The learning rate for BERT is set to 1e 5. The number of iteration update stages is set to 3. ... with a batch size of 11. |