Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
Authors: Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, Jing Xiao, Yinqiang Zheng, Yujiu Yang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/Tianhe Wu/Assessor360. |
| Researcher Affiliation | Collaboration | 1 Shenzhen International Graduate School, Tsinghua University 2 The University of Tokyo 3 University of Maryland, College Park 4 Pingan Group |
| Pseudocode | Yes | Algorithm 1 Viewport Sequence Generation (RPS Algorithm) |
| Open Source Code | Yes | The code and models are available at https://github.com/Tianhe Wu/Assessor360. |
| Open Datasets | Yes | We train 300 epochs with batch size 4 on CVIQD [35], OIQA [11], IQA-ODI [46], and MVAQD [18] datasets without the authentic scanpath data. Respectively, we compare our RPS with two advanced learning-based scanpath prediction methods Scan GAN360 [24] and Scan DMM [32] on JUFE [12] and JXUFE [33] datasets which have the authentic scanpath data. |
| Dataset Splits | No | The paper explicitly states: "we randomly split 80% ODIs of each dataset for training, and the remaining 20% is used for testing". It does not explicitly mention a separate validation split for hyperparameter tuning or early stopping. While it mentions model selection based on "highest performance on the testset", this does not constitute a clear validation split. |
| Hardware Specification | No | The paper does not specify the hardware used to run the experiments (e.g., GPU model, CPU, memory). |
| Software Dependencies | No | The paper mentions using a "pre-trained Swin Transformer [23]" and "Adam [21]" for optimization, but it does not specify version numbers for any software dependencies, such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We set the field of view (Fo V) to the 110 following [12, 33]. We use pre-trained Swin Transformer [23] (base version) as our feature extraction backbone. The input viewport size H W is fixed to 224 224. The number of viewport sequences N is set to 3 and the length of each sequence M is set to 5. We set the coordinates of N starting points to be (0 , 0 ). The reduced dimension D is 128 and the number of GRU modules is set to 6. The number of CA operations n is 4. We set γ = 0.7 and β = 100 as decreasing factor and scale factor values respectively. We train 300 epochs with batch size 4 on CVIQD [35], OIQA [11], IQA-ODI [46], and MVAQD [18] datasets without the authentic scanpath data. For optimization, we use Adam [21] and the learning rate is set to 1 10 5 in the training phase. We employ MSE loss to train our model. |