Dynamic Position-aware Network for Fine-grained Image Recognition

Authors: Shijie Wang, Haojie Li, Zhihui Wang, Wanli Ouyang2791-2799

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify that DP-Net yields the best performance under the same settings with most competitive approaches, on CUB Bird, Stanford-Cars, and FGVC Aircraft datasets.
Researcher Affiliation Collaboration Shijie Wang1, 2, Haojie Li1, 2 , Zhihui Wang1, 2, Wanli Ouyang3 1International School of Information Science & Engineering, Dalian University of Technology, China 2Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, China 3 The University of Sydney, Sense Time Computer Vision Research Group, Australia
Pseudocode No The paper includes mathematical formulations and descriptions of modules but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes We comprehensively evaluate our algorithm on Caltech UCSD Birds (Branson et al. 2014) (CUB-200-2011), Stanford Cars (Krause et al. 2013) (Cars) and FGVC Aircraft (Airs) (Maji et al. 2013) datasets, which are widely used benchmark for fine-grained image recognition.
Dataset Splits Yes The CUB200-2011 dataset contains 11,788 images spanning 200 subspecies. The ratio of train data and test data is roughly 1:1. The Cars dataset has 16,185 images from 196 classes officially split into 8,144 training and 8,041 test images. The Airs dataset contains 10,000 images over 100 classes, and the train and test sets split ratio is around 2 : 1.
Hardware Specification No The paper does not explicitly specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "Res Net50 as feature extractor" and "Momentum SGD" but does not provide specific version numbers for these or other software libraries/frameworks.
Experiment Setup Yes In all our experiments, all images are resized to 448 448, and we crop and resize the patches to 224 224 from the original image. We use fully-convolutional network Res Net50 as feature extractor and apply Batch Normalization as regularizer. We also use Momentum SGD with initial learning rate 0.001 and multiplied by 0.1 after 60 epochs. We use weight decay 1e 4. To reduce patch redundancy, we adopt the non-maximum suppression (NMS) on default patches based on their discriminative scores, and the NMS threshold is set to 0.25.