Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare
Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios. |
| Researcher Affiliation | Academia | 1 City University of Hong Kong 2 Nanyang Technological University 3 Shanghai Jiao Tong University 4 South China Normal University 5 Jiangxi University of Finance and Economics 6 Shenzhen Research Institute, City University of Hong Kong |
| Pseudocode | No | The paper does not contain a dedicated pseudocode block or algorithm steps presented in a structured, code-like format. |
| Open Source Code | Yes | Justification: The source codes and instructions are attached to the supplementary material. |
| Open Datasets | Yes | IQA Datasets. We conduct comprehensive experiments across six standard IQA datasets. These datasets are categorized based on the type of distortions they contain: synthetic distortions are featured in LIVE [53], CSIQ [24], and KADID-10k [26]; realistic distortions are present in BID [25], LIVE Challenge (denoted as CLIVE) [54], and Kon IQ-10k [7]. More details regarding these IQA datasets can be found in the Appendix A.2. For our experiments, we utilize the ten splits provided by LIQE3, allocating 70% of images from each dataset for training, 10% for validation, and the remaining 20% for testing. ... The median of the Spearman s rank correlation coeff icient (SRCC) and Pearson linear correlation coefficient (PLCC) across the ten splits are reported in the tables. 3https://github.com/zwx8981/LIQE/tree/main/IQA_Database |
| Dataset Splits | Yes | For our experiments, we utilize the ten splits provided by LIQE3, allocating 70% of images from each dataset for training, 10% for validation, and the remaining 20% for testing. |
| Hardware Specification | Yes | This process requires seven NVIDIA A40 GPUs to meet the computational load. During inference, a single NVIDIA RTX3090 GPU is sufficient for executing the soft comparison (Sec. 3.4). |
| Software Dependencies | No | The paper mentions specific models used (e.g., mPLUG-Owl2, CLIP-ViT-L14, LLaMA2-7B) and deep learning loss (GPT loss), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Training is conducted with a batch size of 64 across all datasets, a fixed learning rate of 2 × 10−5, and spans two epochs. ... Furthermore, to obtain the anchor images, we divide the training set of the Kon IQ-10k into five (α = 5) quality intervals based on their MOSs [57], from which we select one (β = 1) representative anchor image per interval using Eqn. (4). |