Image Retrieval with Self-Supervised Divergence Minimization and Cross-Attention Classification
Authors: Vivek Trivedy, Longin Jan Latecki
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method across several model configurations and four datasets, achieving state-of-the-art performance in multiple settings. We also conduct a thorough set of ablations that show the robustness of our method across full vs. approximate retrieval and different hyperparameter configurations. |
| Researcher Affiliation | Academia | Vivek Trivedy , Longin Jan Latecki Department of Computer and Information Sciences, Temple University, Philadelphia {vivec.trivedy, latecki}@temple.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | CUB-200 [Wah et al., 2011] is a fine-grained dataset containing 11,788 images covering 200 sub-category classes of birds. In-Shop Clothes Retrieval [Liu et al., 2016] is a clothing dataset containing 52,712 total images and 7,896 classes. Cars-196 [Krause et al., 2013] is a dataset of car models containing 16,185 total images and 196 classes. Stanford Online Products (SOP) [Song et al., 2016] contains 120,053 images of 22,634 products downloaded from e Bay.com. |
| Dataset Splits | Yes | In training, we use 80-20 stratified split over classes to create the Training Query and Training Database Sets, respectively. Our splits follow the common approaches used in [El-Nouby et al., 2021a; Ermolov et al., 2022]. CUB-200 ... use the first 100 classes for training and the remaining for testing with no class overlap between train and test settings. In-Shop Clothes Retrieval ... use the first 3997 classes for training and the remaining classes for testing. Cars-196 ... We split the dataset into 8,054 images for training (98 classes) and the remaining 98 classes for testing. Stanford Online Products (SOP) ... We use the standard split of 59,551 images (11,318 classes) for training and 60,502 images (11,316 classes) for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments. |
| Software Dependencies | No | The paper mentions 'Adam W' as an optimizer but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | We use 224 224 input resolution across datasets. During training, we resize to 256 on the smaller size and take a random crop of size 224 224 while during testing we take a center crop of 224 224. ... Unless otherwise stated, we use A = 6 to generate A views of a query (which includes the query as one of the views) during training and use the approximate retrieval approach with k = 12 rather than the full retrieval approach. Unless otherwise stated, we use βfrob = βce = βcac = 1 as described in Equation 20. We use Adam W [Loshchilov and Hutter, 2019] with learning rate 3 10 5 as the optimizer with weight decay 0.01. We use batch size of 256 for all datasets except SOP where we use 128 and number of steps as 250, 600, 2500, and 25000 for CUB, Cars, In Shop, and SOP respectively. |