Learning Attentive Pairwise Interaction for Fine-Grained Classification
Authors: Peiqin Zhuang, Yali Wang, Yu Qiao13130-13137
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft (93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%). |
| Researcher Affiliation | Collaboration | 1Shen Zhen Key Lab of Computer Vision and Pattern Recognition, SIAT-Sense Time Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We evaluate API-Net on five popular fine-grained benchmarks, i.e., CUB-200-2011(Wah et al. 2011), Aircraft(Maji et al. 2013), Stanford Cars(Krause et al. 2013), Stanford Dogs(Khosla et al. 2011) and NABirds (Van Horn et al. 2015). |
| Dataset Splits | Yes | We use the official train & test splits for evaluation. For all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. For each image, we find its most similar image from its own class and the rest classes, according to Euclidean distance between features. As a result, we obtain an intra pair and an inter pair for each image in the batch. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | First, we resize each image as 512 512, and crop a 448 448 patch as input to APINet (train: random cropping, test: center cropping). Furthermore, we use Res Net-101 (pretrained on Image Net) as CNN backbone, and extract the feature vector xi R2048 after global pooling average operation. Second, for all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. [...] For all the datasets, the coefficient λ in Eq. (8) is 1.0, and the margin ϵ in the score-ranking regularization is 0.05. We use the standard SGD with momentum of 0.9, weight decay of 0.0005. Furthermore, the initial learning rate is 0.01 (0.001 for Stanford Dogs), and adopt cosine annealing strategy to adjust it. The total number of training epochs is 100 (50 for Stanford Dogs). Besides, during training phase, we freeze the conv layers and only train the newly-added fully-connected layers in the first 8 epochs(12 epochs for Standord Dogs). |