Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills.
Researcher Affiliation Collaboration 1S-Lab, Nanyang Technological University, 2Shanghai Jiaotong University, 3Sensetime
Pseudocode Yes Algorithm 1 Pytorch-style Pseudo Code for Softmax-based Strategy for IQA with MLLMs
Open Source Code Yes Project Page: https://q-future.github.io/Q-Bench.
Open Datasets Yes For the assessment ability (A3), we utilize plenty of existing IQA databases (Hosu et al., 2020; Lin et al., 2019; Li et al., 2023c) that focus on various low-level appearances of images, to benchmark MLLMs within conventional IQA settings.
Dataset Splits Yes For a holistic examination on the perception ability of MLLMs, we evaluate the multi-choice correctness of MLLMs on different sub-categories of the LLVision dataset, which is equally divided as dev (Tab. 7, will be released) and test (Tab. 2, will keep private) subsets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., library names with versions) were explicitly provided.
Experiment Setup Yes Under this principle, we conduct toy experiments on LLVision QA on Shikra and LLa VA-v1, with two simple instruction strategies: (A) Direct Instruction, in which the prompt is designed as simple as Rate the quality of the image . (B) Numerical Instruction, in which we specifically instruct numerical ratings, with the prompt: Score the quality of the image from 1 to 5, with 1 as lowest and 5 as highest. .