Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Authors: Yue Zhang, Zhiyang Xu, Ying Shen, Parisa Kordjamshidi, Lifu Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that both our proposed dataset and alignment module significantly enhance the situated spatial understanding of 3D-based LLMs.
Researcher Affiliation	Academia	1Michigan State University 2Virginia Tech 3 University of Illinois at Urbana-Champaign 4UC Davis EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed methods and system architecture through textual descriptions and figures (e.g., Fig. 3, 5, 7), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1https://github.com/zhangyuejoslin/Spartun3D
Open Datasets	Yes	To address the aforementioned issues, we propose two key innovations: we first introduce a scalable, LLM-generated dataset named Spartun3D... The 3D scenes in Spartun3D are taken from 3RScan (Wu et al., 2021), which provides a diverse set of realistic 3D environments. ... SQA3D (Ma et al., 2022) introduces a human-annotated dataset where the model generates answers based on questions and given situations.
Dataset Splits	Yes	Table 1: Dataset statistics of Spartun3D and human validation results. Tasks # of Examples Train/Test Captioning 10K 8, 367/1, 350 Attr. & Rel. 62K 61, 254/8, 168 Affordance 40K 35, 070/5, 017 Planning 21K 19, 434/2, 819
Hardware Specification	Yes	The model is trained on a 6 NVIDIA RTX A6000 GPU for around 30 hours with 15 epochs.
Software Dependencies	No	The paper mentions several models and frameworks like PointNet++ (Qi et al., 2017), LEO (Huang et al., 2023), OPT1.3B (Zhang et al., 2023b), Vicuna7B (Chiang et al., 2023), and LoRA (Hu et al., 2021). However, it does not specify version numbers for any software dependencies or libraries used for implementation.
Experiment Setup	Yes	The maximum context length and output length of LLM are both set to 256. For each 3D scene, we sample up to 60 objects with 1024 points per object. During training, the pre-trained 3D point cloud encoder and the LLM are frozen. We set rank and α in Lo RA to be 16 and dropout rate to be 0. During inference, we employ beam search to generate the textual response, and the number of beams is 5. The model is trained on a 6 NVIDIA RTX A6000 GPU for around 30 hours with 15 epochs. The learning rate is 3e 5, and the batch size is 24.