Stop-Gradient Softmax Loss for Deep Metric Learning

Authors: Lu Yang, Peng Wang, Yanning Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on four fine-grained image retrieval benchmarks show that our proposed approach outperforms most existing approaches, i.e., our approach achieves 75.9% on CUB-2002011, 94.7% on CARS196 and 83.1% on SOP which outperforms other approaches at least 1.7%, 2.9% and 1.7% on Recall@1.
Researcher Affiliation Academia 1 School of Computer Science, Northwestern Polytechnical University, Xi an, China 2 National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, China
Pseudocode No The paper presents mathematical formulas and descriptions of its methods but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We conduct extensive experiments on four public image retrieval benchmarks, i.e., CUB-200-2011 (Welinder et al. 2010), CARS-196 (Krause et al. 2013), Stanford Online Products (SOP) (Oh Song et al. 2016a), and In-shop Clothes Retrieval (Liu et al. 2016).
Dataset Splits Yes CUB-200-2011 has 200 classes with 11, 788 images. The first 100 classes (5864 images) for training and the rest of the classes (5,924 images) for testing. CARS-196 has 198 classes with 16, 185 images. The first 98 classes for training (8, 054 images) and the other 98 classes (8, 131 images) for testing. Stanford Online Products has 22, 634 classes with 120, 053 images. The first 11, 318 classes (59, 551 images) for training and the other 11, 316 classes (60, 502 images) for testing.
Hardware Specification Yes Our experiments were executed using Py Torch on GTX 2080Ti GPU.
Software Dependencies No Our experiments were executed using Py Torch on GTX 2080Ti GPU. All the experiments use Res Net50 as the backbone which pre-trained on Image Net. No specific version numbers for software dependencies are provided.
Experiment Setup Yes All the input images were resized to 256 256 and croped to 224 224 with a batch size of 64 (4 images/ID and 16 IDs), and we use fp16 to improve the GPU memory utilization. The model is trained 100 epochs and we set the learning rate of parameter using cosine annealing schedule. We set γ = 30 as default. To build a robust model that can generalize well, we use label smoothing for Lsoftmax. For the stability of training, SGSL starts to join the training only when the value of softmax is smaller than 3.