Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
COS3D: Collaborative Open-Vocabulary 3D Segmentation
Authors: Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate COS3D on two standard benchmarks for OV3DS. Both quantitative and qualitative results show that our method significantly outperforms existing approaches. Also, the ablation studies validate the effectiveness of our designs for both training and inference stages. Furthermore, we present three example applications of COS3D... 4 Experiments |
| Researcher Affiliation | Collaboration | 1 The Chinese University of Hong Kong 2 Autodesk AI Lab 3 Lingnan University 4 Monash University |
| Pseudocode | No | The paper describes the adaptive Lang2Ins prompt refinement process in Section 3.3. It mentions 'More details on the algorithm and automatic threshold generation are provided in Supp.', indicating the algorithm might be in the supplementary material, but no pseudocode block or algorithm figure is present in the main paper. |
| Open Source Code | Yes | Both data and code of our work will be publicly available on github. |
| Open Datasets | Yes | We evaluate our method on the Le RF dataset [9]. ... evaluate performance using m Io U and m Acc for the 10 scenes selected by Open Gaussian [16]... Scan Netv2 [62] |
| Dataset Splits | No | The paper mentions using the Le RF dataset [9] and Scan Netv2 [62] for evaluation, and also mentions using 19, 15, and 10 categories from ScanNetv2, but it does not specify explicit training/test/validation splits (e.g., percentages or counts) for its experiments. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX-4090 GPU. |
| Software Dependencies | No | The paper mentions using the official implementation of 3D-GS [23], CLIP [14], and SAM [47] as foundational models. However, it does not specify any version numbers for these software components or any other libraries/frameworks. |
| Experiment Setup | Yes | We adopt the official implementation of 3D-GS [23] with a default of 30K training iterations as our base architecture. For the instance field and instance-to-language mapping (e.g., MLPs version), we also set the training iterations to 30K by default, following common practice as in [10]. For kernel regression version in mapping, the function is directly formulated without requiring training. ... a predefined threshold τ (set to 0.5 by default)... Here, threshold T is based on the statistical value from instance field. |