Revisiting Training Strategies and Generalization Performance in Deep Metric Learning

Authors: Karsten Roth, Timo Milbich, Samarth Sinha, Prateek Gupta, Bjorn Ommer, Joseph Paul Cohen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To provide a consistent reference point, we revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices as well as the commonly neglected mini-batch sampling process.
Researcher Affiliation Academia 1Mila, Universit e de Montr eal 2HCI/IWR, Heidelberg University 3University of Toronto 4The Alan Turing Institute, University of Oxford.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code and a publicly accessible Wand B-repo are available at https://github.com/Confusezius/ Revisiting Deep Metric Learning Py Torch.
Open Datasets Yes CUB200-2011: Contains 11,788 images in 200 classes of birds. Train/Test sets are made up of the first/last 100 classes (5,864/5,924 images respectively) (Wah et al., 2011). CARS196: Has 16,185 images/196 car classes with even sample distribution. Train/Test sets use the first/last 98 classes (8054/8131 images) (Krause et al., 2013). Stanford Online Products (SOP): Contains 120,053 product images divided into 22,634 classes. Train/Test sets are provided, contain 11,318 classes/59,551 images in the Train and 11,316 classes/60,502 images in the Test set (Oh Song et al., 2016).
Dataset Splits No The paper specifies train and test splits, but does not provide explicit details for a validation dataset split.
Hardware Specification Yes experiments are performed on individual Nvidia Titan X, V100 and T4 GPUs with memory usage limited to 12GB. We also thank Nvidia for donating NVIDIA DGX-1.
Software Dependencies No The paper mentions 'implemented all models in Py Torch (Paszke et al., 2017)' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup Yes Our training protocol follows parts of Wu et al. (2017), which utilize a Res Net50 architecture with frozen Batch Normalization layers and embedding dim. 128 to be comparable with already proposed results with this architecture. In line with standard practices we randomly resize and crop images to 224 224 for training and center crop to the same size for evaluation. During training, random horizontal flipping (p = 0.5) is used. Optimization is performed using Adam (Kingma & Ba, 2015) with learning rate fixed to 10 5 and no learning rate scheduling for unbiased comparison. Weight decay is set to a constant value of 4 10 4, as motivated in section 4.2. Each training is run over 150 epochs for CUB200-2011/CARS196 and 100 epochs for Stanford Online Products, if not stated otherwise. For batch sampling we utilize the the SPC-2 strategy, as motivated in section 4.3.