Information Bottleneck Approach to Spatial Attention Learning

Authors: Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, finegrained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments.
Researcher Affiliation Academia Qiuxia Lai1 , Yu Li1 , Ailing Zeng1 , Minhao Liu1 , Hanqiu Sun2 and Qiang Xu1 1The Chinese University of Hong Kong 2University of Electronic Science and Technology of China
Pseudocode No The paper includes figures illustrating the framework (Fig. 1, Fig. 2) and mathematical equations, but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at this https URL.
Open Datasets Yes CIFAR-10 [Krizhevsky et al., 2009] contains 60, 000 32 32 natural images of 10 classes, which are splited into 50, 000 training and 10, 000 test images. CIFAR-100 [Krizhevsky et al., 2009] is similar to CIFAR-10, except that it has 100 classes. CUB-200-2011 (CUB) [Wah et al., 2011] contains 5, 994 training and 5, 794 testing bird images from 200 classes. SVHN collects 73, 257 training, 26, 032 testing, and 531, 131 extra digit images from house numbers in street view images. STL-10 contains 5, 000 training and 8, 000 test images of resolution 96 96 organized into 10 classes.
Dataset Splits Yes CIFAR-10 [Krizhevsky et al., 2009] contains 60, 000 32 32 natural images of 10 classes, which are splited into 50, 000 training and 10, 000 test images. CIFAR-100 [Krizhevsky et al., 2009] is similar to CIFAR-10, except that it has 100 classes.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch versions).
Experiment Setup Yes We set β = 0.01, λg =0.4, and λc =0.1 empirically. We experiment on K = 64, 128, 256, 512, 1024. As shown in Fig. 4 (b), K =256 achieves the best performance. Fig. 4 (c) shows the classification accuracy when varying number of anchor values Q, where Q between 20 and 50 gives better performance. We use original input images after data augmentation (random flipping and cropping with a padding of 4 pixels).