Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Authors: Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experiments on eight standard BIQA datasets (both synthetic and authentic) demonstrate its superior performance to the state-of-the-art BIQA methods |
| Researcher Affiliation | Academia | 1 Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China 2 School of Information and Electronics, Beijing Institute of Technology, Beijing 100086, China 3 School of Computer Science and Technology, Ocean University of China, Qingdao 266100, China 4 Peng Cheng Laboratory, Shenzhen 518066, China 5 Media Analytics and Computing Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen 361005, China |
| Pseudocode | No | The paper describes the architecture and operations in text and diagrams (Figure 2), but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Checkpoints, logs and code will be available at https://github.com/narthchin/DEIQT. |
| Open Datasets | Yes | We evaluate the performance of the proposed DEIQT on 8 typical BIQA datasets, including 4 synthetic datasets of LIVE (Sheikh, Sabir, and Bovik 2006), CSIQ (Larson and Chandler 2010), TID2013 (Ponomarenko et al. 2015), KADID (Lin, Hosu, and Saupe 2019), and 4 authentic datasets of LIVEC (Ghadiyaram and Bovik 2015), KONIQ (Hosu et al. 2020), LIVEFB (Ying et al. 2020), SPAQ (Fang et al. 2020). |
| Dataset Splits | No | For each dataset, 80% images were used for training and the remaining 20% images were utilized for testing. We repeated this process 10 times to mitigate the performance bias and the medians of SRCC and PLCC were reported. The paper specifies training and testing splits, but does not explicitly mention a separate validation set split. |
| Hardware Specification | No | The paper describes the experimental setup in terms of software parameters and training configurations, but it does not specify any particular hardware (e.g., GPU model, CPU type) used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Vi T-S proposed in Dei T III' and 'Layer-wise Adaptive Moments optimizer', but it does not provide specific version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For DEIQT, we followed the typical training strategy to randomly crop an input image into 10 image patches with a resolution of 224 224. Each image patch was then reshaped into a sequence of patches with the patch size p = 16, and the dimensions of input tokens D = 384. We created the Transformer encoder based on the Vi T-S proposed in Dei T III (Touvron, Cord, and J egou 2022). The depth of the encoder was set to 12, and the number of heads h = 6. For the Decoder, the depth was set to 1 and the number of panel members L = 6. The Encoder of DEIQT was pre-trained on the Image Net1K for 400 epochs using the Layer-wise Adaptive Moments optimizer (You et al. 2020) for Batch training with the batch size 2048. DEIQT was trained for 9 Epochs. The learning rate was set to 2 10 4 with a decay factor of 10 every 3 epochs. The batch size was determined depending on the size of the dataset, i.e., 16 and 128 for the LIVEC and Kon IQ, respectively. |