Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation
Authors: Hongwei Niu, Jie Hu, Jianghang Lin, Guannan Jiang, Shengchuan Zhang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that EOV-Seg runs faster and achieves competitive performance compared with state-of-the-art methods. In particular, with COCO training only, EOV-Seg achieves 24.5 PQ, 32.1 m Io U, and 11.6 FPS on the ADE20K dataset. When taking Res Net50 as backbone, it runs 23.8 FPS with only 71M parameters on a single RTX 3090 GPU. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China 2Institute of Artificial Intelligence, Xiamen University, Fujian, China 3National University of Singapore, Singapore 4Contemporary Amperex Technology Co., Limited (CATL), Fujian, China |
| Pseudocode | No | The paper describes methods using mathematical equations and textual descriptions, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to code repositories. |
| Open Datasets | Yes | We only use the COCO Panoptic (Lin et al. 2014) dataset for training, with a crop size of 1024 1024. We evaluated our EOV-Seg for open-vocabulary semantic, instance, and panoptic segmentation on the ADE20K (Zhou et al. 2017) dataset, and for semantic segmentation on the ADE20K (Zhou et al. 2017), PASCAL Context (Mottaghi et al. 2014) and PASCAL VOC (Everingham et al. 2010) datasets. |
| Dataset Splits | Yes | We only use the COCO Panoptic (Lin et al. 2014) dataset for training, with a crop size of 1024 1024. We evaluated our EOV-Seg for open-vocabulary semantic, instance, and panoptic segmentation on the ADE20K (Zhou et al. 2017) dataset, and for semantic segmentation on the ADE20K (Zhou et al. 2017), PASCAL Context (Mottaghi et al. 2014) and PASCAL VOC (Everingham et al. 2010) datasets. During inference, the shortest side of input images is resized to 640, while ensuring the longer side does not exceed 2560. |
| Hardware Specification | Yes | We train our EOV-Seg on 4 NVIDIA 3090 GPUs for a total of 200k iterations. ... equipped with Res Net50 backbone, EOV-Seg runs 23.8 FPS with only 71M parameters on a single RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch library for CAM methods' and various models/frameworks like 'CLIP (Radford et al. 2021)', but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | Following prior works (Xu et al. 2023a; Yu et al. 2023), the training batch size is 16. We train our EOV-Seg on 4 NVIDIA 3090 GPUs for a total of 200k iterations. We only use the COCO Panoptic (Lin et al. 2014) dataset for training, with a crop size of 1024 1024. During inference, the shortest side of input images is resized to 640, while ensuring the longer side does not exceed 2560. |