Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
Authors: Yang Miao, Jan-Nico Zaech, Xi Wang, Fabien Despinoy, Danda Pani Paudel, Luc V Gool
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Lang HOPS across multiple challenging scenarios, including in-domain and cross-dataset object-part instance segmentation, and zero-shot semantic segmentation. Lang HOPS achieves state-of-the-art results, surpassing previous methods by 5.5% Average Precision (AP) (in-domain) and 4.8% (cross-dataset) on the Part Image Net dataset and by 2.5% m IOU on unseen object parts in ADE20K (zero-shot). Ablation studies further validate the effectiveness of the language-grounded hierarchy and MLLMdriven part query refinement strategy. |
| Researcher Affiliation | Collaboration | Yang Miao INSAIT, Sofia University "St. Kliment Ohridski" Jan-Nico Zaech INSAIT, Sofia University "St. Kliment Ohridski" Xi Wang INSAIT, Sofia University "St. Kliment Ohridski" ETH Zurich, TU Munich Fabien Despinoy Toyota Motor Europe Danda Pani Paudel INSAIT, Sofia University "St. Kliment Ohridski" Luc Van Gool INSAIT, Sofia University "St. Kliment Ohridski" |
| Pseudocode | No | The paper describes the method and equations in detail within the |
| Open Source Code | No | The code will be released here. |
| Open Datasets | Yes | We conduct experiments in multiple settings (in-domain, cross-dataset and zero-shot) and on multiple dataset (Part Image Net, Pascal Part-116 and ADE20K). ...object-level datasets INS (consisting of COCO [28], Visual Genome [21] and LVIS [14], with object annotations) and part-level datasets (PART consisting of ADE20K [67], SA1B [20] and PACO [42], with object and part annotations). |
| Dataset Splits | Yes | Experiment Setup. We follow the setup proposed in VLPart [50] where each method is trained on one base dataset and evaluated on another unseen dataset, without finetuning. Two settings are implemented: Pascal-Part-116 [55] Part Image Net [16] and Part Image Net Pascal-Part-116 (i.e., the model is trained on Pascal-Part-116 and evaluated on Part Image Net, and vice versa). |
| Hardware Specification | Yes | The training is conducted on 4 x H200 GPUs with a batch size of 16. |
| Software Dependencies | No | We utilize Pali Gemma 2 [49], a lightweight and state-of-the-art MLLM that takes the image I, the concatenated object-part queries P0, and prompt guidance as input and implements object-part parsing in our framework. The parameters of the Swin-L backbone and Mask DINO decoder are initialized with the pre-trained checkpoints from GLEE [18]. |
| Experiment Setup | Yes | The hyperparameters are set to λcls = 4, λbbox = 2, λmask = 5, L = 9. The number of repeated part queries Np = 3. The training is conducted on 4 x H200 GPUs with a batch size of 16. |