Channel Interaction Networks for Fine-Grained Image Categorization
Authors: Yu Gao, Xintong Han, Xun Wang, Weilin Huang, Matthew Scott10818-10825
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, comprehensive experiments are conducted on three publicly available benchmarks, where the proposed method consistently outperforms the state-of-the-art approaches, such as DFL-CNN(Wang, Morariu, and Davis 2018) and NTS(Yang et al. 2018). |
| Researcher Affiliation | Industry | Yu Gao, Xintong Han, Xun Wang, Weilin Huang, Matthew R. Scott Malong Technologies, Shenzhen, China Shenzhen Malong Artificial Intelligence Research Center, Shenzhen, China {chrgao, xinhan, xunwang, whuang, mscott}@malong.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We employ three publicly available datasets in our experiments: (1) CUB-200-2011 (Wah et al. 2011) with 11,788 images from 200 wild bird species, (2) Stanford Cars (Krause et al. 2013) including 16,185 images over 196 classes, and (3) FGVC Aircraft (Maji et al. 2013) containing 196 classes about 10,000 images. |
| Dataset Splits | No | The paper uses publicly available datasets (CUB-200-2011, Stanford Cars, FGVC Aircraft) but does not explicitly provide the training/validation/test splits (e.g., percentages or exact counts for each split). |
| Hardware Specification | Yes | We report our inference time on a Nvidia TITAN XP GPU with Py Torch implementation. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version details. |
| Experiment Setup | Yes | The input image size is 448 448 as most state-of-the-art fine-grained categorization approaches. By following that of NTS (Yang et al. 2018), we implement data augmentation including random cropping and horizontal flipping during training. Only center cropping is involved in inference. The model is trained for 100 epochs with SGD for all datasets, and the base learning rate is set to 0.001, which annealed by 0.5 every 20 epochs. we use a batch size of 20 and ensure that each batch contains 4 categories with 5 images in each category. And then, we randomly split these 20 images into 10 image pairs. We have tried to use all the O(n2) pairs or apply hard negative mining, which hurt the performance and consume more memory. The weight decay is set to 2 10 4. β in Equation 8 is set to 0.5 empirically. and α in Equation 9 is set to 2.0. Top-1 accuracy is used as the evaluation metric. We use Py Torch to implement our method. |