Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Orientation Matters: Making 3D Generative Models Orientation-Aligned

Authors: Yichong Lu, Yuzhuo Tian, Zijin Jiang, Yikun Zhao, Yuanbo Yang, Hao Ouyang, Haoji Hu, Huimin Yu, Yujun Shen, Yiyi Liao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the superiority of our method over post-hoc alignment approaches. Furthermore, we showcase downstream applications enabled by our aligned object generation, including zero-shot object orientation estimation via analysis-by-synthesis and efficient arrow-based object rotation manipulation. Experimental results across multiple datasets demonstrate that our method achieves superior orientation alignment compared to existing baselines. 6 Experiment 6.1 Implementation Details Dataset. Orientation-aligned 3D generative models Trellis-OA and Wonder3D-OA are trained on our Objaverse-OA dataset, which is curated from Objaverse-LVIS [6]. The base multi-view diffusion model is trained on Objaverse [6], and the base 3D-VAE-based model is trained on TRELLIS500K [57]. To demonstrate the generalizability and accuracy of our method s orientation alignment ability, we evaluate on two unseen datasets, GSO [8] and Toys4k [39]. To further demonstrate the sim-to-real generalizability, we also evaluate on the real-world dataset Imagenet3D [24]. Baselines. For the task of aligned object generation, there are no existing baselines for this task. Therefore, we design baselines that perform this task in two stages: 1) object generation with misaligned orientations, and 2) orient them to aligned poses based on pose estimation using different variants: (i) Principal Component Analysis (PCA); (ii) advanced Vision Language Model (VLM) Gemini-2.0 [32]; and (iii) zero-shot model-free orientation estimation method, Orient Anything [52]. For the task of zero-shot orientation estimation, we compare our method with Orient Anything [52] and FSDet View [58]. Note that FSDet View doesn t support zero-shot estimation. Therefore, we evaluate its performance only on its supported categories. Metrics. To evaluate the orientation alignment ability, we rotate reconstructed 3D models using different kinds of methods and calculate Chamfer Distance (CD), LPIPS [63], and CLIP [34] scores to measure the orientation alignment quality. To evaluate the performance of our zero-shot orientation estimation method, we calculate Acc@30 and orientation absolute error (Abs) according to the rotation error. We follow NOCS to calculate the rotation e R defined by: e R = arccos T r( R RT ) 1 2 , where Tr represents the trace of the matrix. Note that for stick-like objects, top and side directions typically have ambiguity. Therefore, we only calculate the rotation error in the front direction.
Researcher Affiliation Collaboration Yichong Lu1,2* Yuzhuo Tian1* Zijin Jiang1* Yikun Zhao1 Yuanbo Yang1,2 Hao Ouyang2 Haoji Hu1 Huimin Yu1 Yujun Shen2 Yiyi Liao1 1Zhejiang University 2Ant Group
Pseudocode No The paper describes methods in prose and uses diagrams (e.g., Figure 3 for model architectures, Figure 4 for orientation estimation pipeline) but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No Answer: [No] Justification: We don t provide code in the submission.
Open Datasets Yes To support this, we introduce Objaverse-OA, a new dataset comprising 14,832 3D models spanning 1,008 categories, each aligned to a consistent, common-sense orientation. Leveraging Objaverse-OA, we fine-tune two representative 3D generative models [57, 23] to produce Trellis-OA and Wonder3D-OA, allowing for generating well-aligned 3D objects across a broad spectrum of categories, including those not included in the fine-tuning set. Our contributions are as follows: ... 2) We construct Objaverse-OA, the largest orientation-aligned 3D dataset in terms of category coverage. In this section, we introduce the construction of our dataset, Objaverse-OA. Dataset diversity plays a crucial role in achieving strong generalization capability. To the best of our knowledge, the existing orientation-aligned 3D dataset [24] with the largest category number includes only 200 categories and fewer than 2,000 3D objects. In contrast, our Objaverse-OA dataset contains 14,832 orientation-aligned 3D objects across 1008 categories, which will be made publicly available to the research community. We evaluate our method on three unseen datasets: GSO [8], Toys4k [39], and Imagenet3D [24].
Dataset Splits No The paper states that Objaverse-OA is used for training and then evaluates on GSO [8], Toys4k [39], and Imagenet3D [24]. For evaluation, it describes how objects were collected and rendered from these datasets (e.g., "randomly collected 48 objects", "rendered into four images", "randomly selected 439 objects"), but it does not specify explicit training/validation/test splits for the Objaverse-OA dataset used to fine-tune the generative models.
Hardware Specification Yes To fine-tune Trellis-OA, we use a total batch size of 64 for training 30000 steps, which takes only about 10 hours on the cluster of 8 Nvidia Tesla A100 GPUs. To fine-tune Wonder3D-OA, we use a total batch size of 512 for training 40000 steps, which takes about 3 days on the cluster of 8 Nvidia Tesla A100 GPUs..
Software Dependencies Yes We utilize our manually curated dataset as ground truth (GT) and show the error rate of VLM s estimation across different categories. We observe that (1) the VLM demonstrates particular difficulty in recognizing front-facing orientations for stick-like objects, and (2) a significant portion of recognition errors occur when processing objects with inherently unclear or ambiguous frontal views. These challenges highlight the necessity of our manual curation. VLM pre-processing: As discovered by Orient Anything [52], advanced VLMs demonstrate the ability to recognize object front views without task-specific training. Since most models in Objaverse primarily vary in the horizontal (yaw) axis, we follow the strategy proposed in Orient Anything: we render each 3D object from four horizontal viewpoints front, back, left, and right and use a VLM to identify the correct front view. Based on the identified view, we then rotate the 3D model accordingly to align it to a canonical orientation. Our data processing begins with the Objaverse-LVIS dataset, and we use Gemini-2.0 [32] as the VLM for view recognition. From a total of 46,219 3D models, Gemini successfully identifies front views for 20,664 objects. Table 9: License Blender 4.2.8 GNU General Public License (GPL) texthttps://www.blender.org/ Gemini-2.0 [32]
Experiment Setup Yes Training and inference time. To fine-tune Trellis-OA, we use a total batch size of 64 for training 30000 steps, which takes only about 10 hours on the cluster of 8 Nvidia Tesla A100 GPUs. To fine-tune Wonder3D-OA, we use a total batch size of 512 for training 40000 steps, which takes about 3 days on the cluster of 8 Nvidia Tesla A100 GPUs.