Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spatially-aware Weights Tokenization for NeRF-Language Models
Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Spatial LLa NA on Ne RF captioning and Ne RF Q&A tasks, using both existing benchmarks and our novel Spatial Obja Ne RF dataset consisting of 100 manually-curated language annotations for Ne RFs. This dataset features 3D models and descriptions that challenge the spatial reasoning capability of MLLMs. Spatial LLa NA outperforms existing approaches across all tasks. |
| Researcher Affiliation | Academia | Andrea Amaduzzi EMAIL Pierluigi Zama Ramirez EMAIL Giuseppe Lisanti EMAIL Samuele Salti EMAIL Luigi Di Stefano EMAIL CVLab, University of Bologna |
| Pseudocode | Yes | In this Section, we describe the algorithm used to obtain feature vectors from our tri-plane representation, introduced in Section 3. ... Step 1: Map to image grid coordinates ... Step 2: Compute integer and fractional parts ... Step 3: Retrieve four neighboring features ... Step 4: Compute bilinear interpolation |
| Open Source Code | No | Our newly introduced dataset, Spatial Obja Ne RF, the source code and the weights for all our models will be publicly released in case of acceptance. |
| Open Datasets | Yes | To validate our approach, we thoroughly evaluate Spatial LLa NA on the existing Ne RF-Language datasets: Shape Ne RF-Text [53], HST [5], and Obja Ne RF-Text [7]. Moreover, we propose a novel benchmark, Spatial Obja Ne RF, consisting of 100 Ne RF models from Objaverse [18] paired with detailed language descriptions focusing on object parts and their spatial relations. |
| Dataset Splits | Yes | The model is trained on 300K Ne RFs from Shape Ne RF-Text and Obja Ne RF-Text for 500 epochs with a learning rate of 0.00001, as done in LLa NA [7]. ... The total training set consists of approximately 300K Ne RFs with textual annotations. ... Spatial Obja Ne RF provides detailed spatial annotations for a subset of 100 Ne RFs from the GPT4Point test set of Obja Ne RF-Text. |
| Hardware Specification | Yes | The training needs 2 days on four 64GB A100 GPUs to reach convergence. ... S-LLa NA is implemented in Py Torch and trained on NVIDIA A100 GPUs with 64GB of VRAM each. The 7B model requires 8 GPUs for training, while the 13B model requires 16 GPUs. |
| Software Dependencies | No | S-LLa NA is implemented in Py Torch and trained on NVIDIA A100 GPUs with 64GB of VRAM each. ... We leverage Adam W [42] as optimizer, and a cosine learning rate scheduler. |
| Experiment Setup | Yes | The model is trained on 300K Ne RFs from Shape Ne RF-Text and Obja Ne RF-Text for 500 epochs with a learning rate of 0.00001, as done in LLa NA [7]. ... Stage 1: Training on brief textual descriptions from Shape Ne RF-Text and Obja Ne RF-Text for 3 epochs, using a learning rate of 0.002. We leverage Adam W [42] as optimizer, and a cosine learning rate scheduler. Stage 2: Training on brief and detailed textual descriptions, along with Q&A conversations from the same datasets, for 3 epochs, with a learning rate of 0.00002. We leverage Adam W [42] as optimizer, and a cosine learning rate scheduler. |