reproducibilityindex.ai

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills.
Researcher Affiliation	Collaboration	1S-Lab, Nanyang Technological University, 2Shanghai Jiaotong University, 3Sensetime
Pseudocode	Yes	Algorithm 1 Pytorch-style Pseudo Code for Softmax-based Strategy for IQA with MLLMs
Open Source Code	Yes	Project Page: https://q-future.github.io/Q-Bench.
Open Datasets	Yes	For the assessment ability (A3), we utilize plenty of existing IQA databases (Hosu et al., 2020; Lin et al., 2019; Li et al., 2023c) that focus on various low-level appearances of images, to benchmark MLLMs within conventional IQA settings.
Dataset Splits	Yes	For a holistic examination on the perception ability of MLLMs, we evaluate the multi-choice correctness of MLLMs on different sub-categories of the LLVision dataset, which is equally divided as dev (Tab. 7, will be released) and test (Tab. 2, will keep private) subsets.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library names with versions) were explicitly provided.
Experiment Setup	Yes	Under this principle, we conduct toy experiments on LLVision QA on Shikra and LLa VA-v1, with two simple instruction strategies: (A) Direct Instruction, in which the prompt is designed as simple as Rate the quality of the image . (B) Numerical Instruction, in which we specifically instruct numerical ratings, with the prompt: Score the quality of the image from 1 to 5, with 1 as lowest and 5 as highest. .