Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Authors: Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate COS3D on two standard benchmarks for OV3DS. Both quantitative and qualitative results show that our method significantly outperforms existing approaches. Also, the ablation studies validate the effectiveness of our designs for both training and inference stages. Furthermore, we present three example applications of COS3D... 4 Experiments
Researcher Affiliation Collaboration 1 The Chinese University of Hong Kong 2 Autodesk AI Lab 3 Lingnan University 4 Monash University
Pseudocode No The paper describes the adaptive Lang2Ins prompt refinement process in Section 3.3. It mentions 'More details on the algorithm and automatic threshold generation are provided in Supp.', indicating the algorithm might be in the supplementary material, but no pseudocode block or algorithm figure is present in the main paper.
Open Source Code Yes Both data and code of our work will be publicly available on github.
Open Datasets Yes We evaluate our method on the Le RF dataset [9]. ... evaluate performance using m Io U and m Acc for the 10 scenes selected by Open Gaussian [16]... Scan Netv2 [62]
Dataset Splits No The paper mentions using the Le RF dataset [9] and Scan Netv2 [62] for evaluation, and also mentions using 19, 15, and 10 categories from ScanNetv2, but it does not specify explicit training/test/validation splits (e.g., percentages or counts) for its experiments.
Hardware Specification Yes All experiments are conducted on a single RTX-4090 GPU.
Software Dependencies No The paper mentions using the official implementation of 3D-GS [23], CLIP [14], and SAM [47] as foundational models. However, it does not specify any version numbers for these software components or any other libraries/frameworks.
Experiment Setup Yes We adopt the official implementation of 3D-GS [23] with a default of 30K training iterations as our base architecture. For the instance field and instance-to-language mapping (e.g., MLPs version), we also set the training iterations to 30K by default, following common practice as in [10]. For kernel regression version in mapping, the function is directly formulated without requiring training. ... a predefined threshold τ (set to 0.5 by default)... Here, threshold T is based on the statistical value from instance field.