Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Orient Anything V2: Unifying Orientation and Rotation Understanding
Authors: Zehan Wang, Ziang Zhang, Jiayang Xu, Jialei Wang, Tianyu Pang, Chao Du, Hengshuang Zhao, Zhou Zhao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6Do F pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks. Our experiments demonstrate the enhanced and novel capabilities of our model. It achieves superior performance on zero-shot orientation estimation and sets new records on zero-shot rotation estimation (i.e., 6Do F pose estimation [49, 26]), while also accurately handling and predicting different rotational symmetries. |
| Researcher Affiliation | Collaboration | Zehan Wang1,2 , Ziang Zhang1 , Jiayang Xu1, Jialei Wang1, Tianyu Pang3 , Chao Du3, Hengshuang Zhao4, Zhou Zhao1,2 1Zhejiang University; 2Shanghai AI Lab; 3Sea AI Lab; 4The University of Hong Kong |
| Pseudocode | No | The paper describes methods and processes but does not include any explicitly labeled pseudocode or algorithm blocks. The description of the data engine and framework sections are in paragraph form and figures, not structured algorithms. |
| Open Source Code | Yes | The project website is listed as https://orient-anythingv2.github.io/. Additionally, the NeurIPS checklist question 5 explicitly states 'Does the paper provide open access to the data and code...?' with the answer '[Yes]' and justification 'refer to section 6.' |
| Open Datasets | Yes | Orient Anything V1 uses advanced VLM [12, 31] to annotate real 3D assets from Objaverse [8, 7]. ... The proposed data engine enables highly cost-effective and flexible data scaling up... Our final dataset includes 600K assets... The training dataset comprises the Image Net3D training set and newly collected 600k synthetic assets. We mainly compare with Orient Anything V1 [48] on Image Net3D [29] test set and unseen test datasets, SUN-RGBD [41], ARKit Scenes [3], Pascal3D+ [52], Objectron [1] and the Ori_COCO [48]. We benchmark zero-shot 6Do F object pose estimation performance under a single reference view. Evaluation is conducted on four widely used datasets: LINEMOD [16], YCB-Video [5], One Pose++ [15], and One Pose [44]. This evaluation uses the recent, large-scale 3D object datasets with rotational symmetry annotations: Omni6DPose [58], which contain 149 distinct object classes. |
| Dataset Splits | Yes | We mainly compare with Orient Anything V1 [48] on Image Net3D [29] test set... The training dataset comprises the Image Net3D training set and newly collected 600k synthetic assets. The main evaluation metrics are the median 3D angle error (Med ) and accuracy within 30 degrees (Acc30 ). For Ori_COCO, where 20 samples are collected for each class and annotated within 8 horizontal orientations, recognition accuracy (Acc ) is used. We manually select a subset of 3-5 assets per category and render 2 views per 3D asset for testing. This resulted in 838 testing sample. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. While the NeurIPS checklist states 'Yes' for 'Experiments compute resources' and refers to section 6, section 6 itself does not contain this information. |
| Software Dependencies | No | The paper mentions models like DINOv2 and VGGT being used, but it does not specify any software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Our model is initialized from VGGT, a large feed-forward transformer with 1.2 billion parameters pre-trained on 3D geometry tasks. We repurpose its original "camera" token, designed to predict camera extrinsics, to predict object orientation and rotation. This leverages the inherent correlation between camera pose and object rotation. We train the model to fit target orientation (or rotation) distributions using Binary Cross-Entropy (BCE) loss for 20k iterations. A cosine learning rate scheduler is used with an initial rate of 1e-3. Input frames are resized to 518, and random patch masking is used for data augmentation to simulate real-world occlusion. The effective batch size is set to 48, where 1-2 frames are randomly sampled for each training sample. The training dataset comprises the Image Net3D training set and newly collected 600k synthetic assets. Furthermore, we observe that most objects exhibit only four types of rotational symmetry: {0, 1, 2, 4}. Therefore, we restrict our training to consider only these four cases. Any fitted periodicity α > 4 is mapped to 0. |