Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Authors: Zhao Jin, Rong-Cheng Tu, Jingyi Liao, Wenhao Sun, Xiao Luo, Shunyu Liu, Dacheng Tao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Scan Refer and Nr3D benchmarks demonstrate that SPAZER significantly outperforms previous state-of-the-art zero-shot methods, achieving notable gains of 9.0% and 10.9% in accuracy. Our codes are available at https://github.com/JZ-9962/SPAZER. |
| Researcher Affiliation | Academia | 1 College of Computing and Data Science, Nanyang Technological University, Singapore 2 University of California, Los Angeles |
| Pseudocode | No | The paper describes the methodology in prose and through a framework diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes are available at https://github.com/JZ-9962/SPAZER. We will release the full codebase with data preprocessing, model implementation, and evaluation scripts upon publication. |
| Open Datasets | Yes | We evaluate our method on two widely-used 3D visual grounding benchmarks: Scan Refer [6] and Nr3D [1], which are built upon the Scan Net [9] dataset. |
| Dataset Splits | Yes | To enable fair comparison and reduce expenditure, our main experiments are conducted on the same Scan Refer and Nr3D subsets as [48]. We follow previous work VLM-Grounder [48] to evaluate our agent on the subset (250 selected samples) of each dataset. |
| Hardware Specification | Yes | The experiments involving Qwen2-VL-72B and Qwen2.5-VL-72B are conducted on multiple NVIDIA H100 GPUs. |
| Software Dependencies | Yes | Our agent adopts GPT-4o as the default VLM... The default VLM of our agent is GPT-4o (gpt-4o-2024-08-06). The experiments involving Qwen2-VL-72B and Qwen2.5-VL-72B are conducted on multiple NVIDIA H100 GPUs. |
| Experiment Setup | Yes | Our agent adopts GPT-4o as the default VLM. The number of views n is set to 4, and the Top-k parameter is set to k = 4. For Scan Refer dataset, we follow prior works [54, 22] and use a pre-trained model [34] to obtain the 3D bounding boxes. All ablation studies are conducted on the same subset of Nr3D as [48]. The temperature is set to 0.2 to improve the reproducibility of the results. |