Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Authors: Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg. 5 Experiments
Researcher Affiliation Academia 1CMIC, Shanghai Jiao Tong University 2Shanghai AI Laboratory 3Beijing Institute of Technology 4National University of Defense Technology 5CUHK
Pseudocode Yes Algorithm 1 Non-learnable Prototypical Regularization (NPR)
Open Source Code Yes The source code is available at https://github.com/Ferenas/PGSeg.
Open Datasets Yes Following [50, 41, 35, 51], we use CC12M [7] and Red Caps [12] as the training sets, and each of them contains 12 million image-text pairs.
Dataset Splits Yes Table 1 shows the performance of these methods on the validation set of PASCAL VOC12, note that all methods here are trained simply with CC12M. Table 3 lists the m Io U of recent state-of-the-art (SOTA) methods on the validation splits of PASCAL VOC12, PASCAL Context, and COCO datasets.
Hardware Specification Yes The whole training process is implemented on 4 A100 GPUs, each with 80 GB of memory.
Software Dependencies No The paper mentions using 'Adam [25] optimizer' but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup Yes We set the batch size as 4096, and use the cosine learning strategy with an initial learning rate of 1.6e-3. We train the PGSeg for 40 epochs with 5 epochs of linear warm-up. As the generated features are unreliable in early epochs, we set λ = β = 0 at the first 30 epochs. For the selecting threshold ϕ of HRS in NPR, we experimentally set it to 0.1.