Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios.
Researcher Affiliation Academia 1 City University of Hong Kong 2 Nanyang Technological University 3 Shanghai Jiao Tong University 4 South China Normal University 5 Jiangxi University of Finance and Economics 6 Shenzhen Research Institute, City University of Hong Kong
Pseudocode No The paper does not contain a dedicated pseudocode block or algorithm steps presented in a structured, code-like format.
Open Source Code Yes Justification: The source codes and instructions are attached to the supplementary material.
Open Datasets Yes IQA Datasets. We conduct comprehensive experiments across six standard IQA datasets. These datasets are categorized based on the type of distortions they contain: synthetic distortions are featured in LIVE [53], CSIQ [24], and KADID-10k [26]; realistic distortions are present in BID [25], LIVE Challenge (denoted as CLIVE) [54], and Kon IQ-10k [7]. More details regarding these IQA datasets can be found in the Appendix A.2. For our experiments, we utilize the ten splits provided by LIQE3, allocating 70% of images from each dataset for training, 10% for validation, and the remaining 20% for testing. ... The median of the Spearman s rank correlation coeff icient (SRCC) and Pearson linear correlation coefficient (PLCC) across the ten splits are reported in the tables. 3https://github.com/zwx8981/LIQE/tree/main/IQA_Database
Dataset Splits Yes For our experiments, we utilize the ten splits provided by LIQE3, allocating 70% of images from each dataset for training, 10% for validation, and the remaining 20% for testing.
Hardware Specification Yes This process requires seven NVIDIA A40 GPUs to meet the computational load. During inference, a single NVIDIA RTX3090 GPU is sufficient for executing the soft comparison (Sec. 3.4).
Software Dependencies No The paper mentions specific models used (e.g., mPLUG-Owl2, CLIP-ViT-L14, LLaMA2-7B) and deep learning loss (GPT loss), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Training is conducted with a batch size of 64 across all datasets, a fixed learning rate of 2 × 10−5, and spans two epochs. ... Furthermore, to obtain the anchor images, we divide the training set of the Kon IQ-10k into five (α = 5) quality intervals based on their MOSs [57], from which we select one (β = 1) representative anchor image per interval using Eqn. (4).