Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation

Authors: Yang Miao, Jan-Nico Zaech, Xi Wang, Fabien Despinoy, Danda Pani Paudel, Luc V Gool

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Lang HOPS across multiple challenging scenarios, including in-domain and cross-dataset object-part instance segmentation, and zero-shot semantic segmentation. Lang HOPS achieves state-of-the-art results, surpassing previous methods by 5.5% Average Precision (AP) (in-domain) and 4.8% (cross-dataset) on the Part Image Net dataset and by 2.5% m IOU on unseen object parts in ADE20K (zero-shot). Ablation studies further validate the effectiveness of the language-grounded hierarchy and MLLMdriven part query refinement strategy.
Researcher Affiliation	Collaboration	Yang Miao INSAIT, Sofia University "St. Kliment Ohridski" Jan-Nico Zaech INSAIT, Sofia University "St. Kliment Ohridski" Xi Wang INSAIT, Sofia University "St. Kliment Ohridski" ETH Zurich, TU Munich Fabien Despinoy Toyota Motor Europe Danda Pani Paudel INSAIT, Sofia University "St. Kliment Ohridski" Luc Van Gool INSAIT, Sofia University "St. Kliment Ohridski"
Pseudocode	No	The paper describes the method and equations in detail within the
Open Source Code	No	The code will be released here.
Open Datasets	Yes	We conduct experiments in multiple settings (in-domain, cross-dataset and zero-shot) and on multiple dataset (Part Image Net, Pascal Part-116 and ADE20K). ...object-level datasets INS (consisting of COCO [28], Visual Genome [21] and LVIS [14], with object annotations) and part-level datasets (PART consisting of ADE20K [67], SA1B [20] and PACO [42], with object and part annotations).
Dataset Splits	Yes	Experiment Setup. We follow the setup proposed in VLPart [50] where each method is trained on one base dataset and evaluated on another unseen dataset, without finetuning. Two settings are implemented: Pascal-Part-116 [55] Part Image Net [16] and Part Image Net Pascal-Part-116 (i.e., the model is trained on Pascal-Part-116 and evaluated on Part Image Net, and vice versa).
Hardware Specification	Yes	The training is conducted on 4 x H200 GPUs with a batch size of 16.
Software Dependencies	No	We utilize Pali Gemma 2 [49], a lightweight and state-of-the-art MLLM that takes the image I, the concatenated object-part queries P0, and prompt guidance as input and implements object-part parsing in our framework. The parameters of the Swin-L backbone and Mask DINO decoder are initialized with the pre-trained checkpoints from GLEE [18].
Experiment Setup	Yes	The hyperparameters are set to λcls = 4, λbbox = 2, λmask = 5, L = 9. The number of repeated part queries Np = 3. The training is conducted on 4 x H200 GPUs with a batch size of 16.