Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Authors: Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate COS3D on two standard benchmarks for OV3DS. Both quantitative and qualitative results show that our method significantly outperforms existing approaches. Also, the ablation studies validate the effectiveness of our designs for both training and inference stages. Furthermore, we present three example applications of COS3D... 4 Experiments
Researcher Affiliation	Collaboration	1 The Chinese University of Hong Kong 2 Autodesk AI Lab 3 Lingnan University 4 Monash University
Pseudocode	No	The paper describes the adaptive Lang2Ins prompt refinement process in Section 3.3. It mentions 'More details on the algorithm and automatic threshold generation are provided in Supp.', indicating the algorithm might be in the supplementary material, but no pseudocode block or algorithm figure is present in the main paper.
Open Source Code	Yes	Both data and code of our work will be publicly available on github.
Open Datasets	Yes	We evaluate our method on the Le RF dataset [9]. ... evaluate performance using m Io U and m Acc for the 10 scenes selected by Open Gaussian [16]... Scan Netv2 [62]
Dataset Splits	No	The paper mentions using the Le RF dataset [9] and Scan Netv2 [62] for evaluation, and also mentions using 19, 15, and 10 categories from ScanNetv2, but it does not specify explicit training/test/validation splits (e.g., percentages or counts) for its experiments.
Hardware Specification	Yes	All experiments are conducted on a single RTX-4090 GPU.
Software Dependencies	No	The paper mentions using the official implementation of 3D-GS [23], CLIP [14], and SAM [47] as foundational models. However, it does not specify any version numbers for these software components or any other libraries/frameworks.
Experiment Setup	Yes	We adopt the official implementation of 3D-GS [23] with a default of 30K training iterations as our base architecture. For the instance field and instance-to-language mapping (e.g., MLPs version), we also set the training iterations to 30K by default, following common practice as in [10]. For kernel regression version in mapping, the function is directly formulated without requiring training. ... a predefined threshold τ (set to 0.5 by default)... Here, threshold T is based on the statistical value from instance field.