Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University, 2Shanghai Jiaotong University, 3Sensetime |
| Pseudocode | Yes | Algorithm 1 Pytorch-style Pseudo Code for Softmax-based Strategy for IQA with MLLMs |
| Open Source Code | Yes | Project Page: https://q-future.github.io/Q-Bench. |
| Open Datasets | Yes | For the assessment ability (A3), we utilize plenty of existing IQA databases (Hosu et al., 2020; Lin et al., 2019; Li et al., 2023c) that focus on various low-level appearances of images, to benchmark MLLMs within conventional IQA settings. |
| Dataset Splits | Yes | For a holistic examination on the perception ability of MLLMs, we evaluate the multi-choice correctness of MLLMs on different sub-categories of the LLVision dataset, which is equally divided as dev (Tab. 7, will be released) and test (Tab. 2, will keep private) subsets. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names with versions) were explicitly provided. |
| Experiment Setup | Yes | Under this principle, we conduct toy experiments on LLVision QA on Shikra and LLa VA-v1, with two simple instruction strategies: (A) Direct Instruction, in which the prompt is designed as simple as Rate the quality of the image . (B) Numerical Instruction, in which we specifically instruct numerical ratings, with the prompt: Score the quality of the image from 1 to 5, with 1 as lowest and 5 as highest. . |