Deep Metric Learning by Online Soft Mining and Class-Aware Attention

Authors: Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Neil M. Robertson5361-5368

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
Researcher Affiliation Collaboration Xinshao Wang,1,2 Yang Hua,1 Elyor Kodirov,2 Guosheng Hu,1,2 Neil M. Robertson1,2 1School of Electronics, Electrical Engineering and Computer Science, Queen s University Belfast, UK 2Anyvision Research Team, UK {xwang39, y.hua, n.robertson}@qub.ac.uk, {elyor, guosheng.hu}@anyvision.co
Pseudocode No The paper describes methods using mathematical equations and textual descriptions but does not provide structured pseudocode or algorithm blocks.
Open Source Code No We will release the source code and trained models for the ease of reproduction.
Open Datasets Yes CUB-200-2011 (Wah et al. 2011) contains 11,788 images of 200 species of birds. CARS196 (Krause et al. 2013) is composed of 16,185 images of 196 types of cars. MARS (Zheng et al. 2016) consists of images (frames) with huge variations due to camera setup. LPW (Song et al. 2018) is a cross-scene video dataset.
Dataset Splits No The paper describes training and testing splits for CUB-200-2011 and CARS196, but does not explicitly mention a validation set split or how it was used: 'The first 100 classes (5,864 images) are used for training and the remaining classes (5,924 images) for testing.'
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper states: 'We implement our method in Caffe (Jia et al. 2014) deep learning framework.' It mentions Caffe but without a specific version number, and no other software dependencies are listed with versions.
Experiment Setup Yes For all datasets, the images are resized to 224 224 during training and testing. No data augmentation is applied for training and testing. For fine-grained categorisation, we set c = 8 and k = 7 for each mini-batch, while c = 3 and k = 18 for video-based person re-identification. We set empirically σOSM = 0.8 and σCAA = 0.18 for all experiments to avoid adjusting them based on the specific dataset, making it general and fair for comparision. When training a model, the weights are initialised by the pretrained model on Image Net (Russakovsky et al. 2015). For optimisation, Stochastic Gradient Decent (SGD) is used with a learning rate of 0.001, a momentum of 0.9. The margin of weighted contrastive loss α is set to 1.2 for all experiments. In our experiments, we fix λ = 0.5, treating the positive set and negative set equally.