Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
Authors: Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg. 5 Experiments |
| Researcher Affiliation | Academia | 1CMIC, Shanghai Jiao Tong University 2Shanghai AI Laboratory 3Beijing Institute of Technology 4National University of Defense Technology 5CUHK |
| Pseudocode | Yes | Algorithm 1 Non-learnable Prototypical Regularization (NPR) |
| Open Source Code | Yes | The source code is available at https://github.com/Ferenas/PGSeg. |
| Open Datasets | Yes | Following [50, 41, 35, 51], we use CC12M [7] and Red Caps [12] as the training sets, and each of them contains 12 million image-text pairs. |
| Dataset Splits | Yes | Table 1 shows the performance of these methods on the validation set of PASCAL VOC12, note that all methods here are trained simply with CC12M. Table 3 lists the m Io U of recent state-of-the-art (SOTA) methods on the validation splits of PASCAL VOC12, PASCAL Context, and COCO datasets. |
| Hardware Specification | Yes | The whole training process is implemented on 4 A100 GPUs, each with 80 GB of memory. |
| Software Dependencies | No | The paper mentions using 'Adam [25] optimizer' but does not provide specific version numbers for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | We set the batch size as 4096, and use the cosine learning strategy with an initial learning rate of 1.6e-3. We train the PGSeg for 40 epochs with 5 epochs of linear warm-up. As the generated features are unreliable in early epochs, we set λ = β = 0 at the first 30 epochs. For the selecting threshold ϕ of HRS in NPR, we experimentally set it to 0.1. |