reproducibilityindex.ai

Learning Attentive Pairwise Interaction for Fine-Grained Classification

Authors: Peiqin Zhuang, Yali Wang, Yu Qiao13130-13137

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on ﬁve popular benchmarks in ﬁne-grained classiﬁcation. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft (93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).
Researcher Affiliation	Collaboration	1Shen Zhen Key Lab of Computer Vision and Pattern Recognition, SIAT-Sense Time Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2SIAT Branch, Shenzhen Institute of Artiﬁcial Intelligence and Robotics for Society
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We evaluate API-Net on ﬁve popular ﬁne-grained benchmarks, i.e., CUB-200-2011(Wah et al. 2011), Aircraft(Maji et al. 2013), Stanford Cars(Krause et al. 2013), Stanford Dogs(Khosla et al. 2011) and NABirds (Van Horn et al. 2015).
Dataset Splits	Yes	We use the ofﬁcial train & test splits for evaluation. For all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. For each image, we ﬁnd its most similar image from its own class and the rest classes, according to Euclidean distance between features. As a result, we obtain an intra pair and an inter pair for each image in the batch.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	First, we resize each image as 512 512, and crop a 448 448 patch as input to APINet (train: random cropping, test: center cropping). Furthermore, we use Res Net-101 (pretrained on Image Net) as CNN backbone, and extract the feature vector xi R2048 after global pooling average operation. Second, for all the datasets, we randomly sample 30 categories in each batch. For each category, we randomly sample 4 images. [...] For all the datasets, the coefﬁcient λ in Eq. (8) is 1.0, and the margin ϵ in the score-ranking regularization is 0.05. We use the standard SGD with momentum of 0.9, weight decay of 0.0005. Furthermore, the initial learning rate is 0.01 (0.001 for Stanford Dogs), and adopt cosine annealing strategy to adjust it. The total number of training epochs is 100 (50 for Stanford Dogs). Besides, during training phase, we freeze the conv layers and only train the newly-added fully-connected layers in the ﬁrst 8 epochs(12 epochs for Standord Dogs).