Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards
Authors: Qingming LIU, Zhen Liu, Dinghuai Zhang, Kui Jia
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps. ... Our extensive experiments show that, compared to the proposed vanilla reward finetuning baselines, our Nabla-R2D3 can effectively, efficiently and robustly finetune 3D-native generative models from 2D reward models with better preference alignment, better textobject alignment and fewer geometric artifacts. ... 5 Experiments |
| Researcher Affiliation | Collaboration | Qingming Liu1,* Zhen Liu1,*, Dinghuai Zhang2 Kui Jia1 1The Chinese University of Hong Kong, Shenzhen 2Microsoft Research |
| Pseudocode | Yes | A Overall algorithm ... Algorithm 1 3D-Native Diffusion Alignment with 2D Rewards using Nabla-R2D3 |
| Open Source Code | Yes | We have released the whole set of code, instructions and data. |
| Open Datasets | Yes | Prompt dataset. We use the prompt sets in G-Objaverse [32], a high-quality subset of the large 3D object dataset Objaverse [5]. For experiments on geometry rewards, we filter out the prompts for which the base models yield very low reward values. ... Aesthetic Score [17], trained on the LAION-Aesthetic dataset [17]... HPSv2 [45], trained on HPDv2 dataset [45]... |
| Dataset Splits | Yes | We use 60 unseen random prompts during the finetuning process for evaluation. For each prompt, we sample a batch of 3D assets (of size 32) to compute the metrics. |
| Hardware Specification | Yes | All experiments were conducted with either two Nvidia Tesla V100 GPUs or Ge Force GTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., Diff Splat, Pix Art-Σ, Stable Diffusion-v1.5, DPM-Solver, DDIM, LoRA) but does not provide specific version numbers for any underlying software libraries or programming languages. |
| Experiment Setup | Yes | We set λ to 3e3, 5e3, 1e4 for Aesthetic Score, HPSv2 and Geometry Reward respectively. During training, we sub-sample 40% of the transitions from each collected trajectory. For HPSv2 and Aesthetic Score experiments, we set the reward temperature β to 2e6 and 1e7, respectively; for geometry rewards, we set β to 1e6. To sample camera views c, we first sample four orthogonal views (front, left, back and right) with randomly sampled elevation 20 and then apply azimuthal perturbations by adding random offsets within a predefined range 60 . We use a learning rate of 10 4 for Re FL, DDPO, DRa FT and Nabla-R2D3. The CFG scales are set to 7.5, 7.5, and 3.5 for Diff Splat-Pixart-Σ, Diff Splat SD1.5, and Gaussian Cube, respectively. |