A Unified Framework for 3D Scene Understanding

Authors: Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three benchmarks, including Scan Net20, Scan Refer, and Scan Net200, demonstrate that the Uni Seg3D consistently outperforms current SOTA methods, even those specialized for individual tasks.
Researcher Affiliation Academia Wei Xu , Chunsheng Shi , Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai Huazhong University of Science and Technology {wxu2023, csshi, dkliang, xbai}@hust.edu.cn
Pseudocode No The paper describes its methodology in Section 3 and illustrates it with Fig. 2, but it does not provide formal pseudocode or algorithm blocks.
Open Source Code Yes Code and models are available at https://dk-liang.github.io/Uni Seg3D/.
Open Datasets Yes Datasets. We evaluate the Uni Seg3D on three benchmarks: Scan Net20 [6], Scan Refer [1], and Scan Net200 [47].
Dataset Splits Yes All models are trained for 512 epochs on a single NVIDIA RTX 4090 GPU and evaluated per 16 epochs on the validation set to find the best-performed model.
Hardware Specification Yes All models are trained for 512 epochs on a single NVIDIA RTX 4090 GPU and evaluated per 16 epochs on the validation set to find the best-performed model.
Software Dependencies No The paper mentions using a 'frozen CLIP [46] text encoder' but does not specify version numbers for any software dependencies.
Experiment Setup Yes We adopt the Adam W optimizer with the polynomial schedule, setting an initial learning rate as 0.0001 and the weight decay as 0.05. All models are trained for 512 epochs on a single NVIDIA RTX 4090 GPU and evaluated per 16 epochs on the validation set to find the best-performed model.