Information Bottleneck Approach to Spatial Attention Learning
Authors: Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, finegrained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. |
| Researcher Affiliation | Academia | Qiuxia Lai1 , Yu Li1 , Ailing Zeng1 , Minhao Liu1 , Hanqiu Sun2 and Qiang Xu1 1The Chinese University of Hong Kong 2University of Electronic Science and Technology of China |
| Pseudocode | No | The paper includes figures illustrating the framework (Fig. 1, Fig. 2) and mathematical equations, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at this https URL. |
| Open Datasets | Yes | CIFAR-10 [Krizhevsky et al., 2009] contains 60, 000 32 32 natural images of 10 classes, which are splited into 50, 000 training and 10, 000 test images. CIFAR-100 [Krizhevsky et al., 2009] is similar to CIFAR-10, except that it has 100 classes. CUB-200-2011 (CUB) [Wah et al., 2011] contains 5, 994 training and 5, 794 testing bird images from 200 classes. SVHN collects 73, 257 training, 26, 032 testing, and 531, 131 extra digit images from house numbers in street view images. STL-10 contains 5, 000 training and 8, 000 test images of resolution 96 96 organized into 10 classes. |
| Dataset Splits | Yes | CIFAR-10 [Krizhevsky et al., 2009] contains 60, 000 32 32 natural images of 10 classes, which are splited into 50, 000 training and 10, 000 test images. CIFAR-100 [Krizhevsky et al., 2009] is similar to CIFAR-10, except that it has 100 classes. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | We set β = 0.01, λg =0.4, and λc =0.1 empirically. We experiment on K = 64, 128, 256, 512, 1024. As shown in Fig. 4 (b), K =256 achieves the best performance. Fig. 4 (c) shows the classification accuracy when varying number of anchor values Q, where Q between 20 and 50 gives better performance. We use original input images after data augmentation (random flipping and cropping with a padding of 4 pixels). |