Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Authors: Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that Lang Splat V2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https://langsplat-v2.github.io. |
| Researcher Affiliation | Collaboration | 1Harvard University 2University of Chinese Academy of Sciences 3Tsinghua University 4Johns Hopkins University 5MIT-IBM Watson AI Lab 6UMass Amherst |
| Pseudocode | Yes | A Algorithm The proposed efficient sparse coefficient splatting process is shown in Algorithm 1. |
| Open Source Code | Yes | Codes and demos are available at our project page: https://langsplat-v2.github.io. |
| Open Datasets | Yes | We evaluate our method on the LERF, 3D-OVS, and Mip-Ne RF360 datasets. The LERF dataset [13], captured using the i Phone App Polycam, contains in-the-wild scenes. For the openvocabulary 3D object localization task, we adopt the augmented localization annotations provided by Lang Splat [12] on the LERF dataset. Additionally, we use the segmentation ground truth from Lang Splat [12] for the open-vocabulary 3D segmentation task on LERF. Beyond LERF, we also conduct 3D segmentation experiments on the 3D-OVS and Mip-Ne RF360 [66] datasets. |
| Dataset Splits | No | The paper mentions evaluating on specific "test scenes" like "Ramen", "Teatime", "Kitchen", and "Figurines" from the LERF dataset, and performing training for specific iterations, but does not provide explicit percentages, counts, or a detailed methodology for splitting the datasets into training, validation, and test sets. It implies using existing dataset structures but does not specify the splits used for reproduction. |
| Hardware Specification | Yes | In this paper, we introduce Lang Splat V2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 speedup and a 47 boost over Lang Splat respectively, along with improved query accuracy. Lang Splat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, Lang Splat does not yet achieve realtime inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. |
| Software Dependencies | No | The paper mentions using 'Open CLIP Vi T-B/16 model' and 'Vi T-H model for SAM [65]' and 'CUDA optimization' but does not provide specific version numbers for any programming languages, libraries, frameworks, or operating systems used in the experimental setup. |
| Experiment Setup | Yes | Implementation Details. Following Lang Splat [12], we use the Open CLIP Vi T-B/16 model to extract CLIP features. We employ the Vi T-H model for SAM [65] to segment images and obtain masks with three hierarchical semantics. The codebook size L is set to 64 and the K is set to 4. During test-time querying, we render three semantic scales simultaneously, leading to the actual rendering dimension of 12. The 3D Gaussians are first trained with RGB supervision for 30,000 iterations to reconstruct the RGB scene. Then we train another 10,000 iterations for the 3D sparse coefficient field by fixing all other 3D Gaussian parameters. All our experiments are conducted on one A100 GPU. |