Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

Authors: Jinkun Hao, Naifu Liang, Zhen Luo, Xudong XU, Weipeng Zhong, Ran Yi, Yichen Jin, Zhaoyang Lyu, Feng Zheng, Lizhuang Ma, Jiangmiao Pang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Exhaustive experiments demonstrate the superior performance of Mesa Task compared to baselines in generating task-conforming tabletop scenes with realistic layouts.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University 2Shanghai AI Laboratory 3 SII 4Southern University of Science and Technology 5Peking University
Pseudocode	No	The paper describes the 'Spatial Reasoning Chain' and a set of 'Rules to determine the spatial relationships between objects' in Table 4, but these are not presented in a structured pseudocode or algorithm block format.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We will release our data and code after paper is accepted.
Open Datasets	No	To support research on such a challenging task, we introduce Mesa Task10K, a large-scale dataset comprising approximately 10,700 synthetic tabletop scenes with manually crafted layouts that ensure realistic layouts and intricate inter-object relations. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We will release our data and code after paper is accepted.
Dataset Splits	Yes	We build our training data based on the training split of our Mesa Task-10k dataset, which contains 10, 000 tabletop scenes. For each scene, we generate five task instructions following the reasoning data creation process above, resulting in a total of 50, 000 task-scene pairs for the supervised fine-tuning. During the stage of DPO training, we construct the paired dataset using 5, 000 previously unseen scenes, where each normal layout sample corresponds to two disrupted layouts on average, thereby yielding a total of 10, 000 positive-negative layout pairs for the DPO training.
Hardware Specification	Yes	All experiments are conducted on a cluster of eight A800 GPUs.
Software Dependencies	Yes	We adopt Qwen3-8b[31] as the base LLM for both supervised fine-tuning (SFT) and direct preference optimization (DPO).
Experiment Setup	Yes	In the SFT stage, the model is trained for one epoch using the learning rate of 1 10 5. In the DPO stage, we train for one epoch, with the learning rate of 1 10 6. We perform full-parameter fine-tuning in both stages.