Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

Authors: Qingming LIU, Zhen Liu, Dinghuai Zhang, Kui Jia

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps. ... Our extensive experiments show that, compared to the proposed vanilla reward finetuning baselines, our Nabla-R2D3 can effectively, efficiently and robustly finetune 3D-native generative models from 2D reward models with better preference alignment, better textobject alignment and fewer geometric artifacts. ... 5 Experiments
Researcher Affiliation	Collaboration	Qingming Liu1,* Zhen Liu1,*, Dinghuai Zhang2 Kui Jia1 1The Chinese University of Hong Kong, Shenzhen 2Microsoft Research
Pseudocode	Yes	A Overall algorithm ... Algorithm 1 3D-Native Diffusion Alignment with 2D Rewards using Nabla-R2D3
Open Source Code	Yes	We have released the whole set of code, instructions and data.
Open Datasets	Yes	Prompt dataset. We use the prompt sets in G-Objaverse [32], a high-quality subset of the large 3D object dataset Objaverse [5]. For experiments on geometry rewards, we filter out the prompts for which the base models yield very low reward values. ... Aesthetic Score [17], trained on the LAION-Aesthetic dataset [17]... HPSv2 [45], trained on HPDv2 dataset [45]...
Dataset Splits	Yes	We use 60 unseen random prompts during the finetuning process for evaluation. For each prompt, we sample a batch of 3D assets (of size 32) to compute the metrics.
Hardware Specification	Yes	All experiments were conducted with either two Nvidia Tesla V100 GPUs or Ge Force GTX 3090 GPUs.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., Diff Splat, Pix Art-Σ, Stable Diffusion-v1.5, DPM-Solver, DDIM, LoRA) but does not provide specific version numbers for any underlying software libraries or programming languages.
Experiment Setup	Yes	We set λ to 3e3, 5e3, 1e4 for Aesthetic Score, HPSv2 and Geometry Reward respectively. During training, we sub-sample 40% of the transitions from each collected trajectory. For HPSv2 and Aesthetic Score experiments, we set the reward temperature β to 2e6 and 1e7, respectively; for geometry rewards, we set β to 1e6. To sample camera views c, we first sample four orthogonal views (front, left, back and right) with randomly sampled elevation 20 and then apply azimuthal perturbations by adding random offsets within a predefined range 60 . We use a learning rate of 10 4 for Re FL, DDPO, DRa FT and Nabla-R2D3. The CFG scales are set to 7.5, 7.5, and 3.5 for Diff Splat-Pixart-Σ, Diff Splat SD1.5, and Gaussian Cube, respectively.