Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation
Authors: Jiahao Li, Yang Lu, Yuan Xie, Yanyun Qu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method in both zero-shot and open-vocabulary settings. |
| Researcher Affiliation | Academia | 1School of Informatics, Xiamen University 2Institute of Artificial Intelligence, Xiamen University 3Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 4School of Computer Science and Technology, East China Normal University 5Chongqing Institute of East China Normal University |
| Pseudocode | No | The paper includes mathematical formulations and diagrams but does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We provide an anonymous URL in the supplementary material. |
| Open Datasets | Yes | ADE20K[46] consists of 25k images for training and 2k images for validation. Pascal VOC 2012[47] includes 10,582 augmented training images and 1,449 validation images. COCOStuff164K[48] contains 118,287 training images and 5,000 validation images, with 171 classes in total. Pascal Context[49] consists of 10,100 images, of which 4,996 are used for training and 5,104 for validation, covering 60 classes. |
| Dataset Splits | Yes | ADE20K[46] consists of 25k images for training and 2k images for validation. Pascal VOC 2012[47] includes 10,582 augmented training images and 1,449 validation images. COCOStuff164K[48] contains 118,287 training images and 5,000 validation images, with 171 classes in total. Pascal Context[49] consists of 10,100 images, of which 4,996 are used for training and 5,104 for validation, covering 60 classes. |
| Hardware Specification | Yes | We conduct all experiments on eight NVIDIA GTX 3090 GPUs using the MMSegmentation tool [50]. To ensure a fair comparison, we report our results based on the open-source code from [19] and evaluate them with an input resolution of 512x512 on a single NVIDIA GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions using "MMSegmentation tool [50]" but does not specify a version number for this or any other software components used in the experiments. |
| Experiment Setup | Yes | We set the batch size of 4 for each GPU and set the input resolution to 512x512 pixels. The data augmentation strategy adheres to the default settings in MMSegmentation, which includes random image resizing with a short-side range of [256, 1024] and a crop size of 512x512. The optimizer is AdamW, initialized with a learning rate of 2e-5 and a weight decay of 1e-2. The learning rate follows a polynomial decay schedule with a power of 0.9. The number of iterations is set to 20K for the VOC dataset, 80K for the COCO dataset, and 40K for the Context dataset. We set λ1 and λ2 in Eq. 11 to 100 and 1, respectively. |