Visual Attention Emerges from Recurrent Sparse Reconstruction

Authors: Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate VARS on five large-scale robustness benchmarks of naturally corrupted, adversarially perturbed and out-of-distribution images on Image Net, where VARS consistently outperforms previous methods. We also assess the quality of attention maps on human eye fixation and image segmentation datasets, and show that VARS produces higher quality attention maps than self-attention.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Microsoft Research.
Pseudocode No Figure 2 provides an overview of VARS as a diagram with iterative steps, but it is not presented as structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code or links to a code repository.
Open Datasets Yes We evaluate VARS on five large-scale robustness benchmarks of naturally corrupted, adversarially perturbed and out-of-distribution images on Image Net, where VARS consistently outperforms previous methods. We also assess the quality of attention maps on human eye fixation and image segmentation datasets, and show that VARS produces higher quality attention maps than self-attention. Dataset Name Type Image Net-C (IN-C) (Hendrycks & Dietterich, 2019) Natural corruption Image Net-R (IN-R) (Hendrycks et al., 2021a) Out of distribution Image Net-SK (IN-SK) (Wang et al., 2019) Out of distribution PGD (Madry et al., 2017) Adversarial attack Image Net-A (IN-A) (Hendrycks et al., 2021b) Natural adv. example PACS (Li et al., 2017) Domain generalization PASCAL VOC (Everingham et al., 2010) Semantic segmentation MIT1003 (Judd et al., 2009) Human eye fixation
Dataset Splits Yes We finetune the Image Net-pretrained models on three source domains in PACS (Li et al., 2017) and test them on the left-out target domain... We evaluate RVT with self-attention and VARS on the validation set of PASCAL VOC 2012 using the model trained on Image Net-1K.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes We choose k = 3 in our experiments for efficiency... We adopt λ = 0.3 in our experiments which has a slightly better performance than the other values.