Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation

Authors: Yunhong Min, Daehyeon Choi, Kyeongmin Yeo, Jihyun Lee, Minhyuk Sung

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that ORIGEN outperforms both training-based and test-time guidance methods across quantitative metrics and user studies. ... Since no existing method has quantitatively evaluated 3D orientation grounding in text-to-image generation (except for user studies by [18]), we curate a benchmark based on the MS-COCO dataset [23]... We demonstrate that ORIGEN significantly outperforms previous orientation-conditioned image generative models [18, 17] on both our benchmark and user studies. We compare ORIGEN s performance from two perspectives: (1) Orientation Alignment and (2) Text-to-Image Alignment. Table 1: Quantitative comparisons on 3D orientation grounded image generation.
Researcher Affiliation	Academia	Yunhong Min Daehyeon Choi Kyeongmin Yeo Jihyun Lee Minhyuk Sung KAIST EMAIL
Pseudocode	Yes	Algorithm 1 ORIGEN
Open Source Code	No	Project Page: https://origen2025.github.io. ... Our two main datasets ORIBENCH-Single and ORIBENCH-Multi (in Sec. 4.1 are carefully curated subsets of the MS-COCO validation set [23], specifically selected to support fine-grained evaluation of 3D orientation controllability in image generation. While these datasets are not currently released, they are constructed from publicly available datas using a reproducible filtering protocol, which we plan to release in future revisions. The code is also under preparation for public release.
Open Datasets	Yes	Since no existing method has quantitatively evaluated 3D orientation grounding in text-to-image generation (except for user studies by [18]), we curate a benchmark based on the MS-COCO dataset [23], mixing and matching object classes and orientations to create images with single or multiple orientation-grounded objects. ... Microsoft COCO: Common objects in context. In ECCV, 2014.
Dataset Splits	No	ORIBENCH-Single was constructed by mix-matching the image captions and grounding orientations, ultimately forming a dataset consisting of 25 object classes, each with 40 samples, totaling 1K samples (See Appendix E to check object classes we used). ORIBENCH-Multi. ... forming a dataset consisting of 371 samples, each containing a varying number of objects.
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA 48GB VRAM A6000 GPU. ... All measurements were conducted on a single A100 GPU with 80GB of memory.
Software Dependencies	No	We use FLUX-Schnell [21] as the one-step generative model for both ORIGEN and Re NO [22], while all multi-step guided generation baselines (DPS [32], MPGD [33], and Free Do M [35]) are implemented using FLUX-Dev [21] as the multi-step generative model. ... We used FLUX-Schnell [21] as our one-step T2I generative model and Orient Anything (Vi T-L) [20] to measure the Orientation Grounding Reward, as detailed in Sec. 3.2.
Experiment Setup	Yes	For all experiments, we set η = 0.8 in Alg. 1, and used γ = 0.3 for ORIBENCH-Single and γ = 0.2 for ORIBENCH-Multi, as this configuration provides a favorable balance between image quality and computational cost when evaluating with 50 NFEs. ... We set the gradient weight hyperparameter to 0.3 for ORIBENCH-Single and 0.2 for ORIBENCH-Multi for all methods. In the case of Re NO [22], we use norm-based regularization weight of 0.01. To ensure a fair comparison, we matched the number of function evaluations (NFEs) for all training-free one-step, multi-step guidance methods to 50.