reproducibilityindex.ai

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Authors: Ardhendu Behera, Zachary Wharton, Pradeep R P G Hewage, Asish Bera929-937

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach using six state-of-the-art (Sot A) backbone networks and eight benchmark datasets. Our method signiﬁcantly outperforms the Sot A approaches on six datasets and is very competitive with the remaining two.
Researcher Affiliation	Academia	Ardhendu Behera, Zachary Wharton, Pradeep R P G Hewage and Asish Bera Department of Computer Science, Edge Hill University St Helen Road, Lancashire United Kingdom, L39 4QP beheraa@edgehill.ac.uk, zachary.wharton@go.edgehill.ac.uk, pradeep.hewage@edgehill.ac.uk, beraa@edgehill.ac.uk
Pseudocode	No	The paper describes the approach in text and figures, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://ardhendubehera.github.io/cap/.
Open Datasets	Yes	We comprehensively evaluate our model on widely used eight benchmark FGVC datasets: Aircraft (Maji et al. 2013), Food-101 (Bossard, Guillaumin, and Gool 2014), Stanford Cars (Krause et al. 2013), Stanford Dogs (Khosla et al. 2011), Caltech Birds (CUB-200) (Wah et al. 2011), Oxford Flower (Nilsback and Zisserman 2008), Oxford-IIIT Pets (Parkhi et al. 2012), and NABirds (Van Horn et al. 2015).
Dataset Splits	Yes	Statistics of datasets and their train/test splits are shown in Table 1. We use the top-1 accuracy (%) for evaluation. Experimental settings: In all our experiments, we resize images to size 256 × 256, apply data augmentation techniques of random rotation (±15 degrees), random scaling (1 ± 0.15) and then random cropping to select the ﬁnal size of 224 × 224 from 256 × 256.
Hardware Specification	Yes	The model is trained for 150 epochs using an NVIDIA Titan V GPU (12 GB).
Software Dependencies	No	We use Keras+Tensorﬂow to implement our algorithm. The paper does not specify version numbers for these software dependencies.
Experiment Setup	Yes	In all our experiments, we resize images to size 256 × 256, apply data augmentation techniques of random rotation (±15 degrees), random scaling (1 ± 0.15) and then random cropping to select the ﬁnal size of 224 × 224 from 256 × 256. We set the cluster size to 32 in our learnable pooling approach. We apply Stochastic Gradient Descent (SGD) optimizer to optimize the categorical cross-entropy loss function. The SGD is initialized with a momentum of 0.99, and an initial learning rate 1e-4, which is multiplied by 0.1 after every 50 epochs. The model is trained for 150 epochs using an NVIDIA Titan V GPU (12 GB).