Spectral Prompt Tuning: Unveiling Unseen Classes for Zero-Shot Semantic Segmentation
Authors: Wenhao Xu, Rongtao Xu, Changwei Wang, Shibiao Xu, Li Guo, Man Zhang, Xiaopeng Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on two public datasets, we demonstrate the superiority of our method over state-of-the-art approaches, performing well across all classes and particularly excelling in handling unseen classes. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Beijing University of Posts and Telecommunications 2State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 3School of Artificial Intelligence,University of Chinese Academy of Sciences |
| Pseudocode | No | The paper presents architecture diagrams and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using the 'MMSegmentation open-source toolbox' and cites its GitHub link, but it does not provide an explicit statement or link for the source code of their own described methodology (SPT-SEG). |
| Open Datasets | Yes | We conducted extensive experiments on two benchmark datasets to evaluate the effectiveness of our proposed method: PASCAL VOC 2012 (20), COCO-Stuff 164K. |
| Dataset Splits | Yes | PASCAL VOC 2012: This dataset consists of 10,582 augmented images for training and 1,449 for validation. COCO-Stuff 164K: It is a large-scale dataset with 118,287 training images and 5,000 testing images |
| Hardware Specification | Yes | All experiments were conducted on two H800 GPUs using the pre-trained CLIP Vi T-B/16 model. |
| Software Dependencies | Yes | Our proposed method is implemented using the MMSegmentation open-source toolbox(Contributors 2020) with Py Torch 1.10.1. |
| Experiment Setup | Yes | The batch size was set to 16, and the images were resized to a resolution of 512 512. We performed a total of 20,000 training iterations on the PASCAL VOC 2012 dataset, and 96,000 iterations on the COCO-Stuff 164K dataset. ... The optimizer used was Adam W |