Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Authors: Jiahe Li, Jiawei Zhang, Youmin Zhang, Xiao Bai, Jin Zheng, Xiaohan Yu, Lin Gu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate our superior performance compared to existing methods across diverse challenging scenarios, excelling in geometric accuracy, detail preservation, and reconstruction completeness while maintaining high efficiency. Code is available at https://github.com/Fictionarry/Geo SVR. 39th Conference on Neural Information Processing Systems (Neur IPS 2025). |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering, State Key Laboratory of Complex Critical Software Environment, Jiangxi Research Institute, Beihang University 2Rawmantic AI 3State Key Laboratory of Virtual Reality Technology and Systems, Beijing 4Macquarie University 5Tohoku University EMAIL |
| Pseudocode | No | The paper describes the methods through textual explanations and mathematical equations, such as Eq. (4) for Voxel Geometric Uncertainty and Eq. (12) for the total objective loss function, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Extensive experiments demonstrate our superior performance compared to existing methods across diverse challenging scenarios, excelling in geometric accuracy, detail preservation, and reconstruction completeness while maintaining high efficiency. Code is available at https://github.com/Fictionarry/Geo SVR. |
| Open Datasets | Yes | We use the prevailing DTU, Tanks and Temples (Tn T), and Mip-Ne RF 360 datasets for evaluation. The scene selections of DTU and Tn T are consistent with previous works [71, 59, 38, 28], preprocessed following 2DGS [28] and Neuralangelo [38]. The voxel size of TSDF is set to 0.002 for DTU and is calculated for Tn T following PGSR [11]. The images in DTU and Tn T are downsampled 2 , and in Mip-Ne RF 360 are downsampled 2 or 4 following [32] for indoor and outdoor scenes. The DTU dataset used in the experiment is preprocessed from 2DGS [28] through COMLAP [52, 51] 4. In Tn T dataset, we follow previous work to use 6 high-quality scenes from the Training Data split that provides publicly accessible ground truth for evaluation. The camera poses and scene boundary are translated by the script provided by Neuralangelo [38] 5. For Mip-Ne RF 360, we use all 9 scenes for evaluation. The images are downsampled 2 or 4 following [32] for indoor scenes ("bonsai", "counter", "kitchen", "room") and outdoor scenes ("bicycle", "garden", "flowers", "stump", "treehill"). The camera poses are provided along with the dataset. |
| Dataset Splits | Yes | The scene selections of DTU and Tn T are consistent with previous works [71, 59, 38, 28], preprocessed following 2DGS [28] and Neuralangelo [38]. The images in DTU and Tn T are downsampled 2 , and in Mip-Ne RF 360 are downsampled 2 or 4 following [32] for indoor and outdoor scenes. Specifically, we follow the previous works to select 15 scans with ids of 24, 37, 40, 55, 63, 65, 69, 83, 97, 105, 106, 110, 114, 118, 122, and use the half-resolution images as the training data. The DTU dataset used in the experiment is preprocessed from 2DGS [28] through COMLAP [52, 51] 4. In Tn T dataset, we follow previous work to use 6 high-quality scenes from the Training Data split that provides publicly accessible ground truth for evaluation. The camera poses and scene boundary are translated by the script provided by Neuralangelo [38] 5. For Mip-Ne RF 360, we use all 9 scenes for evaluation. The images are downsampled 2 or 4 following [32] for indoor scenes ("bonsai", "counter", "kitchen", "room") and outdoor scenes ("bicycle", "garden", "flowers", "stump", "treehill"). The camera poses are provided along with the dataset. |
| Hardware Specification | Yes | All experiments are conducted on RTX 3090 Ti GPUs. |
| Software Dependencies | No | Our code is implemented with Py Torch and CUDA kernels, built upon SVRaster [55]. In the experiments, we train each model with 20, 000 iterations, with the learning rates for density and SHs at degree 0 and the others of 0.05, 0.01, and 0.00025 in Adam [33] optimizer. |
| Experiment Setup | Yes | Implementation Details. Our code is implemented with Py Torch and CUDA kernels, built upon SVRaster [55]. In the experiments, we train each model with 20, 000 iterations, with the learning rates for density and SHs at degree 0 and the others of 0.05, 0.01, and 0.00025 in Adam [33] optimizer. We use Depth Anything V2 [69] to provide the depth cues. The patch size of 7 7 is used for patch warping, and γ in voxel dropout is set to 0.5 and 0.3 for DTU and Tn T datasets. The Octree setups keep the same as in [55], and the prune interval is increase to 2, 000 for finer expression. In our method, we use TSDF for mesh extraction. All experiments are conducted on RTX 3090 Ti GPUs. The total objective is composed of the photometric loss Lphoto from SVRaster, the depth constraint LD-unc from Eq. (7), NCC loss for geometry regularization, and the voxel regularizations in Sec. 3.3: L = Lphoto + ηLD-unc + τLNCC + µ1Rrec + µ2Rsp. In this work, we set the weights of η = 0.1, τ = 0.01, µ1 = 10 5, and µ2 = 10 6, respectively. The voxel size of TSDF is set to 0.002 for DTU and is calculated for Tn T following PGSR [11]. |