Learning Attentive Pairwise Interaction for Fine-Grained Classification

Authors: Peiqin Zhuang, Yali Wang, Yu Qiao13130-13137

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft (93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).
Researcher Affiliation Collaboration 1Shen Zhen Key Lab of Computer Vision and Pattern Recognition, SIAT-Sense Time Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We evaluate API-Net on five popular fine-grained benchmarks, i.e., CUB-200-2011(Wah et al. 2011), Aircraft(Maji et al. 2013), Stanford Cars(Krause et al. 2013), Stanford Dogs(Khosla et al. 2011) and NABirds (Van Horn et al. 2015).
Dataset Splits Yes We use the official train & test splits for evaluation. For all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. For each image, we find its most similar image from its own class and the rest classes, according to Euclidean distance between features. As a result, we obtain an intra pair and an inter pair for each image in the batch.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions "Py Torch" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes First, we resize each image as 512 512, and crop a 448 448 patch as input to APINet (train: random cropping, test: center cropping). Furthermore, we use Res Net-101 (pretrained on Image Net) as CNN backbone, and extract the feature vector xi R2048 after global pooling average operation. Second, for all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. [...] For all the datasets, the coefficient λ in Eq. (8) is 1.0, and the margin ϵ in the score-ranking regularization is 0.05. We use the standard SGD with momentum of 0.9, weight decay of 0.0005. Furthermore, the initial learning rate is 0.01 (0.001 for Stanford Dogs), and adopt cosine annealing strategy to adjust it. The total number of training epochs is 100 (50 for Stanford Dogs). Besides, during training phase, we freeze the conv layers and only train the newly-added fully-connected layers in the first 8 epochs(12 epochs for Standord Dogs).