Fine-Grained Retrieval Prompt Tuning
Authors: Shijie Wang, Jianlong Chang, Zhihui Wang, Haojie Li, Wanli Ouyang, Qi Tian
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets. Datasets. CUB-200-2011 (Branson et al. 2014) contains 200 bird subcategories with 11,788 images. We utilize the first 100 classes (5,864 images) in training and the rest (5,924 images) in testing. The Stanford Cars (Krause et al. 2013) contains 196 car models of 16,185 images. Ablation Study. We conduct some ablation experiments to illustrate the effectiveness of the proposed modules. Comparison with the State-of-the-Art Methods. We compare our FRPT with state-of-the-art (SOTA) fine-grained object retrieval approaches. In Tab. 2, the performance of different methods on CUB-200-2011, Stanford Cars-196, and FGVC Aircraft datasets is reported, respectively. |
| Researcher Affiliation | Collaboration | Shijie Wang1, Jianlong Chang2, Zhihui Wang1, Haojie Li1,3*, Wanli Ouyang4, Qi Tian2 1International School of Information Science & Engineering, Dalian University of Technology, China 2 Huawei Cloud & AI, China 3College of Computer and Engineering, Shandong University of Science and Technology, China 4Sense Time Computer Vision Research Group, The University of Sydney, Australia |
| Pseudocode | No | The paper describes the approach textually and with diagrams (e.g., Fig. 1), but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links to open-source code or state that the code will be made publicly available. |
| Open Datasets | Yes | Datasets. CUB-200-2011 (Branson et al. 2014) contains 200 bird subcategories with 11,788 images. We utilize the first 100 classes (5,864 images) in training and the rest (5,924 images) in testing. The Stanford Cars (Krause et al. 2013) contains 196 car models of 16,185 images. The spilt in Stanford Cars (Krause et al. 2013) is also similar to CUB, which is split into the first 98 classes (8,045 images) for training and the remaining classes (8,131 images) for testing. FGVC Aircraft (Maji et al. 2013) is divided into first 50 classes (5,000 images) for training and the rest 50 classes (5,000 images) for testing. |
| Dataset Splits | No | The paper specifies training and testing splits for datasets but does not explicitly mention a separate validation split or how it's handled (e.g., within training). |
| Hardware Specification | Yes | Our model is relatively lightweight and is trained end-to-end on two NVIDIA 2080Ti GPUs for acceleration. |
| Software Dependencies | No | The paper mentions using ResNet, SGD optimizer, and common data augmentation techniques, but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | We train our models using Stochastic Gradient Descent (SGD) optimizer with weight decay of 0.0001, momentum of 0.9, and batch size of 32. We adopt the commonly used data augmentation techniques, i.e., random cropping and erasing, left-right flipping, and color jittering for robust feature representations. The total number of training epochs is set to 500. The initial learning rate is set to 10^-3, with exponential decay of 0.9 after every 50 epochs. |