Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Authors: Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, Jing Xiao, Yinqiang Zheng, Yujiu Yang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/Tianhe Wu/Assessor360.
Researcher Affiliation Collaboration 1 Shenzhen International Graduate School, Tsinghua University 2 The University of Tokyo 3 University of Maryland, College Park 4 Pingan Group
Pseudocode Yes Algorithm 1 Viewport Sequence Generation (RPS Algorithm)
Open Source Code Yes The code and models are available at https://github.com/Tianhe Wu/Assessor360.
Open Datasets Yes We train 300 epochs with batch size 4 on CVIQD [35], OIQA [11], IQA-ODI [46], and MVAQD [18] datasets without the authentic scanpath data. Respectively, we compare our RPS with two advanced learning-based scanpath prediction methods Scan GAN360 [24] and Scan DMM [32] on JUFE [12] and JXUFE [33] datasets which have the authentic scanpath data.
Dataset Splits No The paper explicitly states: "we randomly split 80% ODIs of each dataset for training, and the remaining 20% is used for testing". It does not explicitly mention a separate validation split for hyperparameter tuning or early stopping. While it mentions model selection based on "highest performance on the testset", this does not constitute a clear validation split.
Hardware Specification No The paper does not specify the hardware used to run the experiments (e.g., GPU model, CPU, memory).
Software Dependencies No The paper mentions using a "pre-trained Swin Transformer [23]" and "Adam [21]" for optimization, but it does not specify version numbers for any software dependencies, such as Python, PyTorch, or other libraries.
Experiment Setup Yes We set the field of view (Fo V) to the 110 following [12, 33]. We use pre-trained Swin Transformer [23] (base version) as our feature extraction backbone. The input viewport size H W is fixed to 224 224. The number of viewport sequences N is set to 3 and the length of each sequence M is set to 5. We set the coordinates of N starting points to be (0 , 0 ). The reduced dimension D is 128 and the number of GRU modules is set to 6. The number of CA operations n is 4. We set γ = 0.7 and β = 100 as decreasing factor and scale factor values respectively. We train 300 epochs with batch size 4 on CVIQD [35], OIQA [11], IQA-ODI [46], and MVAQD [18] datasets without the authentic scanpath data. For optimization, we use Adam [21] and the learning rate is set to 1 10 5 in the training phase. We employ MSE loss to train our model.