Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Authors: Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, Kede Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the proposed Visual Quality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, Visual Quality-R1 is capable of generating contextually rich, humanaligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make Visual Quality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation. To validate Visual Quality-R1, we conduct comprehensive experiments across diverse distortion scenarios, ablation studies on key design components, and in-depth analysis of model behaviors.
Researcher Affiliation	Collaboration	Tianhe Wu1,2, Jian Zou1, Jie Liang2, Lei Zhang2,3 , and Kede Ma1 1City University of Hong Kong 2OPPO Research Institute 3The Hong Kong Polytechnic University
Pseudocode	No	The paper describes the methodology and algorithms using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code will be made publicly available after obtaining the company s approval.
Open Datasets	Yes	We first train NR-IQA models on the synthetic KADID-10K [23] training set (6 : 2 : 2 split while ensuring content independence) and test in a zero-shot setting across eight datasets with distortions arising from digital imaging and (post-)processing stages: BID [7], CLIVE [11], Kon IQ-10k [15], SPAQ [8], Liu13 (deblurring) [26], SRIQA-Bench (superresolution) [6], Min19 (dehazing) [32], and AGIQA-3K (image generation) [20].
Dataset Splits	Yes	We first train NR-IQA models on the synthetic KADID-10K [23] training set (6 : 2 : 2 split while ensuring content independence) and test in a zero-shot setting across eight datasets with distortions arising from digital imaging and (post-)processing stages: BID [7], CLIVE [11], Kon IQ-10k [15], SPAQ [8], Liu13 (deblurring) [26], SRIQA-Bench (superresolution) [6], Min19 (dehazing) [32], and AGIQA-3K (image generation) [20].
Hardware Specification	Yes	Training runs on 16 NVIDIA A100 GPUs with a minibatch size of eight per GPU, taking approximately five hours for a total of 10 epochs.
Software Dependencies	No	The paper mentions using the Adam W optimizer and fine-tuning Qwen2.5-VL-7B. However, it does not specify version numbers for any software libraries or frameworks like PyTorch, TensorFlow, etc., that would be needed for replication.
Experiment Setup	Yes	We fine-tune Qwen2.5-VL-7B [1] as the backbone for Visual Quality-R1 using GRPO [36]. The Adam W optimizer [27] is employed with an initial learning rate of 1 10 6 and a linear decay schedule. For GRPO, we generate six candidate responses per prompt (i.e., K = 6) and set the balance coefficient β to 0.04. Training runs on 16 NVIDIA A100 GPUs with a minibatch size of eight per GPU, taking approximately five hours for a total of 10 epochs.