Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
Authors: Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, Haoqian Wang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that MVReward can serve as a reliable metric and MVP consistently enhances the alignment of multi-view diffusion models with human preferences. We conduct a user study to evaluate MVReward s ability in predicting human preferences. We perform ablation studies on the encoder backbone, multi-view self-attention, and negative samples to assess their effects on MVReward. |
| Researcher Affiliation | Academia | 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Zhejiang University |
| Pseudocode | Yes | Algorithm 1: Multi-View Preference Learning (MVP) for Multi-View DMs |
| Open Source Code | Yes | Code https://github.com/victor-thu/MVReward |
| Open Datasets | Yes | We begin by generating and filtering a standardized image prompt set from DALL E (Ramesh et al. 2021) and Objaverse (Deitke et al. 2023), ensuring the object(s) in each image are fully visible with well-designed geometry and texture. Furthermore, taking the widely-used GSO dataset (Downs et al. 2022) as an example |
| Dataset Splits | Yes | The training, validation and test datasets are split according to an 8:1:1 ratio. |
| Hardware Specification | Yes | Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up. |
| Software Dependencies | No | The paper mentions BLIP and VIT-B as pre-trained models but does not specify version numbers for any software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up. The model parameters are fixed except for the designated trainable modules within the UNet. |