What Do Deep Saliency Models Learn about Visual Attention?
Authors: Shi Chen, Ming Jiang, Qi Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By applying our framework, we conduct extensive analyses from various perspectives, including the positive and negative weights of semantics, the impact of training data and architectural designs, the progressive influences of fine-tuning, and common failure patterns of state-of-the-art deep saliency models. Additionally, we demonstrate the effectiveness of our framework by exploring visual attention characteristics in various application scenarios. Our method offers an interpretable interface that enables researchers to better understand the relationships between visual semantics and saliency prediction, as well as a tool for analyzing the performance of deep saliency models in various applications. Table 1: Comparative results of saliency prediction on three popular datasets. |
| Researcher Affiliation | Academia | Shi Chen Ming Jiang Qi Zhao Department of Computer Science and Engineering, University of Minnesota {chen4595, mjiang, qzhao}@umn.edu |
| Pseudocode | No | The paper describes the methodology using mathematical equations and text, but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/szzexpoi/saliency_analysis. |
| Open Datasets | Yes | To correlate implicit features with interpretable visual semantics, we leverage the Visual Genome [20] dataset. We experiment with three state-of-the-art saliency prediction models, including SALICON [9], DINet [11] and Tran Sal Net [10]. All models are optimized with a combination of saliency evaluation metrics (i.e., Normalized Scanpath Saliency (NSS) [49], Correlation Coefficient (CC) [50], and KL-Divergence (KLD) [51]) as proposed in [8], and use Res Net-50 [13] as the backbone. trained on the SALICON [12] dataset. three DINet models trained on different datasets: SALICON [12], OSIE [17], and MIT [16]. on three commonly used saliency datasets, including OSIE [17], MIT [16], and SALICON [12]. |
| Dataset Splits | No | fine-tuned models with the best validation performance (Complete fine-tuning). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or computational resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of deep neural networks and models like ResNet-50, but it does not specify any software dependencies or libraries with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | No | All models are optimized with a combination of saliency evaluation metrics (i.e., Normalized Scanpath Saliency (NSS) [49], Correlation Coefficient (CC) [50], and KL-Divergence (KLD) [51]) as proposed in [8], and use Res Net-50 [13] as the backbone. Model training follows a two-step paradigm: (1) The model is optimized to factorize features with trainable bases... (2) We freeze the model weights learned in the previous step and reroute the saliency inference... Only the last layer W sal is fine-tuned... |