Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Authors: Yongsen Mao, Junhao Zhong, Chuan Fang, Jia Zheng, Rui Tang, Hao Zhu, Ping Tan, Zihan Zhou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To train SPATIALLM, we collect a large-scale, high-quality synthetic dataset consisting of the point clouds of 12,328 indoor scenes (54,778 rooms) with groundtruth 3D annotations, and conduct a careful study on various modeling and training decisions. On public benchmarks, our model gives state-of-the-art performance in layout estimation and competitive results in 3D object detection.
Researcher Affiliation	Collaboration	1Manycore Tech Inc. 2Hong Kong University of Science and Technology
Pseudocode	No	The paper describes the structured representation of layouts and objects in Figure 2 and provides an example conversation in Figure 12. However, it does not include a clearly labeled pseudocode or algorithm block for the core methodology of SPATIALLM.
Open Source Code	Yes	We have made our code, model, and dataset publicly available. All resources are available at https://manycore-research.github.io/Spatial LM/
Open Datasets	Yes	SPATIALLM dataset (ours) syn. 12,328... We have made our code, model, and dataset publicly available. All resources are available at https://manycore-research.github.io/Spatial LM/. ... For experiments on Structured3D [64] and Scan Net [13], we use a batch size of 8.
Dataset Splits	Yes	Finally, the dataset is divided into 11,328/500/500 scenes for training/validation/testing. ... We use the original data split of 3000/250/250 for training/validation/testing, respectively. ... The training set is composed of 1,201 scenes, while 312 scenes are used for validation/testing.
Hardware Specification	Yes	We utilized 32 NVIDIA H20 GPUs, and training on SPATIALLM dataset takes approximately one day.
Software Dependencies	No	The paper mentions specific models like Qwen2.5-0.5B and Sonata, and general optimizers like Adam W, but does not provide specific version numbers for software libraries or development environments (e.g., Python, PyTorch, CUDA) used in the experiments.
Experiment Setup	Yes	We train SPATIALLM for 4 epochs with a total batch size of 64. The learning rate is set at 10^4, using a cosine scheduler with a warm-up ratio of 0.03. The parameters for Adam W optimizer are as follows: adam_beta1 is 0.9, adam_beta2 is 0.99, and adam_epsilon is 1 × 10^8. ... We set of resolution at the finest level at 2.5cm.