Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Authors: Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, XinQiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, Jiawei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrated the effectiveness and generalization of our SOFAR, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Researcher Affiliation Collaboration 1Tsinghua University 2Shanghai Jiao Tong University 3Galbot 4Peking University 5UIUC 6Shanghai Tech University 7Eastern Institute of Technology 8Shanghai Qi Zhi Institute
Pseudocode No The paper describes methods through text and diagrams (e.g., Figure 4 for Point SO model architecture, Figure 5 for SOFAR system overview), but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Project Page Git Hub Code Hugging Face
Open Datasets Yes To support this, we construct Orien Text300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop Point SO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SOFAR framework enables 6-Do F spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SOFAR, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Dataset Splits Yes To validate annotation quality, we construct a validation set containing 208 samples with manually labeled filtering criteria and semantic orientation labels, respectively. From Fig. 3b, we observe that GPT-4o achieves an average accuracy of 88.3% and 97.1% accuracy on filtering and annotating, respectively. This provides a quality guarantee of our Orien Text300K. In the Open6DOR [25] task, we supplement the training dataset with additional samples retrieved and manually annotated from Objaverse [20], ensuring alignment with the object categories in the original benchmark. This dataset includes approximately 3,000 6-Do F object manipulation instructions.
Hardware Specification Yes GPU device 8 H800
Software Dependencies No The paper mentions various software components and models (e.g., CLIP, Point Net, LLa MA, SAM, Florence-2, GPT-4o, robosuite, OMPL, GSNet, SAPIEN) but does not provide specific version numbers for these components to ensure reproducibility of software dependencies.
Experiment Setup Yes Table 14: Training recipes for Point SO and SOFAR-LLa VA. This table includes details such as optimizer (Adam W), learning rate (5e-5, 2e-5), weight decay (5e-2, 0), learning rate scheduler (cosine), training epochs (300, 50, 2), warmup epochs (10, 5, 0.03), batch size (256, 128), drop path rate (0.2), number of points (10000), number of point patches (512), point patch size (32), and augmentation strategies (Rot&Part&Noise, Rotation).