Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification
Authors: Ardhendu Behera, Zachary Wharton, Pradeep R P G Hewage, Asish Bera929-937
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach using six state-of-the-art (Sot A) backbone networks and eight benchmark datasets. Our method significantly outperforms the Sot A approaches on six datasets and is very competitive with the remaining two. |
| Researcher Affiliation | Academia | Ardhendu Behera, Zachary Wharton, Pradeep R P G Hewage and Asish Bera Department of Computer Science, Edge Hill University St Helen Road, Lancashire United Kingdom, L39 4QP beheraa@edgehill.ac.uk, zachary.wharton@go.edgehill.ac.uk, pradeep.hewage@edgehill.ac.uk, beraa@edgehill.ac.uk |
| Pseudocode | No | The paper describes the approach in text and figures, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://ardhendubehera.github.io/cap/. |
| Open Datasets | Yes | We comprehensively evaluate our model on widely used eight benchmark FGVC datasets: Aircraft (Maji et al. 2013), Food-101 (Bossard, Guillaumin, and Gool 2014), Stanford Cars (Krause et al. 2013), Stanford Dogs (Khosla et al. 2011), Caltech Birds (CUB-200) (Wah et al. 2011), Oxford Flower (Nilsback and Zisserman 2008), Oxford-IIIT Pets (Parkhi et al. 2012), and NABirds (Van Horn et al. 2015). |
| Dataset Splits | Yes | Statistics of datasets and their train/test splits are shown in Table 1. We use the top-1 accuracy (%) for evaluation. Experimental settings: In all our experiments, we resize images to size 256 × 256, apply data augmentation techniques of random rotation (±15 degrees), random scaling (1 ± 0.15) and then random cropping to select the final size of 224 × 224 from 256 × 256. |
| Hardware Specification | Yes | The model is trained for 150 epochs using an NVIDIA Titan V GPU (12 GB). |
| Software Dependencies | No | We use Keras+Tensorflow to implement our algorithm. The paper does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | In all our experiments, we resize images to size 256 × 256, apply data augmentation techniques of random rotation (±15 degrees), random scaling (1 ± 0.15) and then random cropping to select the final size of 224 × 224 from 256 × 256. We set the cluster size to 32 in our learnable pooling approach. We apply Stochastic Gradient Descent (SGD) optimizer to optimize the categorical cross-entropy loss function. The SGD is initialized with a momentum of 0.99, and an initial learning rate 1e-4, which is multiplied by 0.1 after every 50 epochs. The model is trained for 150 epochs using an NVIDIA Titan V GPU (12 GB). |