Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Authors: Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, XinQiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, Jiawei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrated the effectiveness and generalization of our SOFAR, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Researcher Affiliation	Collaboration	1Tsinghua University 2Shanghai Jiao Tong University 3Galbot 4Peking University 5UIUC 6Shanghai Tech University 7Eastern Institute of Technology 8Shanghai Qi Zhi Institute
Pseudocode	No	The paper describes methods through text and diagrams (e.g., Figure 4 for Point SO model architecture, Figure 5 for SOFAR system overview), but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Project Page Git Hub Code Hugging Face
Open Datasets	Yes	To support this, we construct Orien Text300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop Point SO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SOFAR framework enables 6-Do F spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SOFAR, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Dataset Splits	Yes	To validate annotation quality, we construct a validation set containing 208 samples with manually labeled filtering criteria and semantic orientation labels, respectively. From Fig. 3b, we observe that GPT-4o achieves an average accuracy of 88.3% and 97.1% accuracy on filtering and annotating, respectively. This provides a quality guarantee of our Orien Text300K. In the Open6DOR [25] task, we supplement the training dataset with additional samples retrieved and manually annotated from Objaverse [20], ensuring alignment with the object categories in the original benchmark. This dataset includes approximately 3,000 6-Do F object manipulation instructions.
Hardware Specification	Yes	GPU device 8 H800
Software Dependencies	No	The paper mentions various software components and models (e.g., CLIP, Point Net, LLa MA, SAM, Florence-2, GPT-4o, robosuite, OMPL, GSNet, SAPIEN) but does not provide specific version numbers for these components to ensure reproducibility of software dependencies.
Experiment Setup	Yes	Table 14: Training recipes for Point SO and SOFAR-LLa VA. This table includes details such as optimizer (Adam W), learning rate (5e-5, 2e-5), weight decay (5e-2, 0), learning rate scheduler (cosine), training epochs (300, 50, 2), warmup epochs (10, 5, 0.03), batch size (256, 128), drop path rate (0.2), number of points (10000), number of point patches (512), point patch size (32), and augmentation strategies (Rot&Part&Noise, Rotation).