Hierarchical Open-vocabulary Universal Image Segmentation

Authors: Xudong Wang, Shufan Li, Konstantinos Kallidromitis, Yusuke Kato, Kazuki Kozuka, Trevor Darrell

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively benchmark HIPIE on various popular datasets to validate its effectiveness, including MSCOCO, ADE20K, Pascal Panoptic Part, and Ref COCO/Ref COCOg. HIPIE achieves state-of-the-art performance across all these datasets that cover a variety of tasks and granularity." and "4 Experiments We comprehensively evaluate HIPIE through quantitative and qualitative analyses to demonstrate its effectiveness in performing various types of open-vocabulary segmentation and detection tasks. The implementation details of HIPIE are explained in Sec. 4.1. Sec. 4.2 presents the evaluation results of HIPIE. Additionally, we conduct an ablation study of various design choices in Sec. 4.3.
Researcher Affiliation Collaboration Xudong Wang1 Shufan Li1 Konstantinos Kallidromitis2 Yusuke Kato2 Kazuki Kozuka2 Trevor Darrell1 1Berkeley AI Research, UC Berkeley 2Panasonic AI Research
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code No project page: http://people.eecs.berkeley.edu/xdwang/projects/HIPIE (This is a project page, not an explicit statement of code release or a direct link to a code repository).
Open Datasets Yes We extensively benchmark HIPIE on various popular datasets to validate its effectiveness, including MSCOCO, ADE20K, Pascal Panoptic Part, and Ref COCO/Ref COCOg." Also, citations like "[34] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740 755. Springer, 2014.
Dataset Splits No We report PQ for panoptic segmentation on MSCOCO, APmask for instance segmentation on MSCOCO, and o Io U for referring segmentation on Ref COCO s validation set." This confirms use of validation sets, but the paper does not specify the explicit splits (e.g., percentages or sample counts for train/validation/test splits).
Hardware Specification Yes We train all our models on NVIDIA-A100 GPUs with a batch size of 2 per GPU using Adam W [38] optimizer.
Software Dependencies No The paper mentions various models and optimizers (e.g., "Adam W [38]", "BERT model [6]", "CLIP [43]"), but does not specify their version numbers or the versions of underlying software dependencies like Python or PyTorch.
Experiment Setup Yes HIPIE is first pre-trained on Objects365 [48] for 340k iterations, using a batch size of 64 and a learning rate of 0.0002, and the learning rate is dropped by a factor of 10 after the 90th percentile of the schedule. After the pre-training stage, we finetune HIPIE on COCO [34], Ref COCO, Ref COCOg, and Ref COCO+ [41, 61] jointly for 120k iterations, using a batch size of 32 and a learning rate of 0.0002." Also, for loss functions: "λcls = 2.0, λmask = 5.0, λbox = 5.0, λce = 1.0, λdice = 1.0, λL1 = 1.0, λgiou = 0.2.