Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences

Authors: Weitao Wang, Haoran Xu, Yuxiao Yang, Zhifang Liu, Jun Meng, Haoqian Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that MVReward can serve as a reliable metric and MVP consistently enhances the alignment of multi-view diffusion models with human preferences. We conduct a user study to evaluate MVReward s ability in predicting human preferences. We perform ablation studies on the encoder backbone, multi-view self-attention, and negative samples to assess their effects on MVReward.
Researcher Affiliation Academia 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Zhejiang University
Pseudocode Yes Algorithm 1: Multi-View Preference Learning (MVP) for Multi-View DMs
Open Source Code Yes Code https://github.com/victor-thu/MVReward
Open Datasets Yes We begin by generating and filtering a standardized image prompt set from DALL E (Ramesh et al. 2021) and Objaverse (Deitke et al. 2023), ensuring the object(s) in each image are fully visible with well-designed geometry and texture. Furthermore, taking the widely-used GSO dataset (Downs et al. 2022) as an example
Dataset Splits Yes The training, validation and test datasets are split according to an 8:1:1 ratio.
Hardware Specification Yes Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up.
Software Dependencies No The paper mentions BLIP and VIT-B as pre-trained models but does not specify version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup Yes Optimal performance is achieved with a batch size of 96 in total, an initial learning rate of 4e-5 using cosine annealing, on 4 NVIDIA Quadro RTX 8000. Both models are fine-tuned in half-precision on 8 NVIDIA Quadro RTX 8000, with a batch size of 128 in total and a learning rate of 5e-6 with warm-up. The model parameters are fixed except for the designated trainable modules within the UNet.