Understanding Multi-Granularity for Open-Vocabulary Part Segmentation

Authors: Jiho Choi, Seonho Lee, Seungho Lee, Minhyun Lee, Hyunjung Shim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Part CLIPSeg outperforms existing state-of-the-art OVPS methods, offering refined segmentation and an advanced understanding of part relationships within images.
Researcher Affiliation Academia Jiho Choi1 , Seonho Lee1 , Seungho Lee2, Minhyun Lee2, Hyunjung Shim1 1Graduate School of Artificial Intelligence, KAIST, Republic of Korea 2School of Integrated Technology, Yonsei University, Republic of Korea
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/kaist-cvml/part-clipseg.
Open Datasets Yes We evaluate our method on three part segmentation datasets: Pascal-Part-116 [7, 46], ADE20K-Part-234 [46, 57], and Part Image Net [21].
Dataset Splits Yes Pascal-Part-116 [7, 46] consists of 8,431 training images and 850 test images. ADE20K-Part-234 [46, 57] consists of 7,347 training images and 1,016 validation images.
Hardware Specification Yes All our experiments are conducted on 8 NVIDIA A6000 GPUs.
Software Dependencies No The paper mentions using CLIP ViT-B/16 and ADAMW optimizer but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The model is trained using the ADAMW optimizer with a base learning rate of 0.0001 over 20,000 iterations, with a batch size of 8 images. We employ a Warmup Poly LR learning rate scheduler to manage the learning rate throughout the training process. To ensure model stability, we apply gradient clipping with a maximum gradient norm of 0.01.