Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
Authors: Tuan Tran Anh, Duy M. H. Nguyen, Hoai-Chau Tran, Michael Barz, Khoa D Doan, Roger Wattenhofer, Vien Ngo, Mathias Niepert, Daniel Sonntag, Paul Swoboda
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method across multiple 3D vision tasks and show consistent improvements in computational efficiency. This work is the first to assess redundancy in large-scale 3D transformer models, providing insights into the development of more efficient 3D foundation architectures. Our code and checkpoints are publicly available at https://gitmerge3d.github.io. ... We evaluate our method on three 3D tasks: 3D Semantic Segmentation, 3D Reconstruction, and Object Detection. For semantic segmentation, we test our approach on Sonata [91] and PTv3 [92] across four datasets: Scan Net200 [72], Scan Net [16], S3DIS [2], and Nu Scenes [8]. For 3D reconstruction, we evaluate our method using Splat Former [14] on three datasets: Shape Net [10], Object Verse [18], and GSO [22]. |
| Researcher Affiliation | Collaboration | 1German Research Centre for Artificial Intelligence (DFKI) 2Max Planck Research School for Intelligent Systems (IMPRS-IS) 3University of Stuttgart, 4 Vin Uni-Illinois Smart Health Center, Vin University 5College of Engineering & Computer Science, Vin University, 6ETH Zurich, 7Vin Robotics, 8University of Oldenburg, 9Heinrich Heine University D usseldorf |
| Pseudocode | Yes | Algorithm 1: ALGORITHM FOR GLOBALLY INFORMED TOKEN MERGING |
| Open Source Code | Yes | Our code and checkpoints are publicly available at https://gitmerge3d.github.io. |
| Open Datasets | Yes | For semantic segmentation, we test our approach on Sonata [91] and PTv3 [92] across four datasets: Scan Net200 [72], Scan Net [16], S3DIS [2], and Nu Scenes [8]. For 3D reconstruction, we evaluate our method using Splat Former [14] on three datasets: Shape Net [10], Object Verse [18], and GSO [22]. |
| Dataset Splits | Yes | We evaluate our method on three 3D tasks: 3D Semantic Segmentation, 3D Reconstruction, and Object Detection. For semantic segmentation, we test our approach on Sonata [91] and PTv3 [92] across four datasets: Scan Net200 [72], Scan Net [16], S3DIS [2], and Nu Scenes [8]. ... Table 3: Comparison of semantic segmentation performance and efficiency on the Nu Scenes validation set [8]. ... Table 1: We compare our method, using a merge rate of 0.8, in two settings—fine-tuned (blue rows) and off-the-shelf (gray rows) against other segmentation and point cloud downsampling methods applied to PTv3. Methods Scan Net Val Scan Net200 Val S3DIS Area5 |
| Hardware Specification | No | Table 7: Sonata downstream task training performance with and without our token merging method (first row and second row respectively). Version m Io U m Acc all Acc GPU Mem GPU Hours Sonata-ft 79.0 86.0 92.7 211.25 GB 55.2 + Ours 78.9 85.3 92.3 74.95 GB 28.3 |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers, such as PyTorch 1.9 or Python 3.8. |
| Experiment Setup | Yes | Finetuning with only 10% of the original training epochs , our merging strategy significantly outperforms others in efficiency. At 80% merging for high-energy and 97% for low-energy branches (K=32), performance remains unaffected. ... We use a threshold τ to decide which patches P to aggressively merge. ... Through empirical evaluation, we found that a kernel size of 127 yields the best performance. |