Visual Attention Emerges from Recurrent Sparse Reconstruction
Authors: Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate VARS on five large-scale robustness benchmarks of naturally corrupted, adversarially perturbed and out-of-distribution images on Image Net, where VARS consistently outperforms previous methods. We also assess the quality of attention maps on human eye fixation and image segmentation datasets, and show that VARS produces higher quality attention maps than self-attention. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Microsoft Research. |
| Pseudocode | No | Figure 2 provides an overview of VARS as a diagram with iterative steps, but it is not presented as structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code or links to a code repository. |
| Open Datasets | Yes | We evaluate VARS on five large-scale robustness benchmarks of naturally corrupted, adversarially perturbed and out-of-distribution images on Image Net, where VARS consistently outperforms previous methods. We also assess the quality of attention maps on human eye fixation and image segmentation datasets, and show that VARS produces higher quality attention maps than self-attention. Dataset Name Type Image Net-C (IN-C) (Hendrycks & Dietterich, 2019) Natural corruption Image Net-R (IN-R) (Hendrycks et al., 2021a) Out of distribution Image Net-SK (IN-SK) (Wang et al., 2019) Out of distribution PGD (Madry et al., 2017) Adversarial attack Image Net-A (IN-A) (Hendrycks et al., 2021b) Natural adv. example PACS (Li et al., 2017) Domain generalization PASCAL VOC (Everingham et al., 2010) Semantic segmentation MIT1003 (Judd et al., 2009) Human eye fixation |
| Dataset Splits | Yes | We finetune the Image Net-pretrained models on three source domains in PACS (Li et al., 2017) and test them on the left-out target domain... We evaluate RVT with self-attention and VARS on the validation set of PASCAL VOC 2012 using the model trained on Image Net-1K. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | We choose k = 3 in our experiments for efficiency... We adopt λ = 0.3 in our experiments which has a slightly better performance than the other values. |