Improved Deep Metric Learning with Multi-class N-pair Loss Objective
Authors: Kihyuk Sohn
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the superiority of our proposed loss to the triplet loss as well as other competing loss functions for a variety of tasks on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification. |
| Researcher Affiliation | Industry | Kihyuk Sohn NEC Laboratories America, Inc. ksohn@nec-labs.com |
| Pseudocode | No | The paper describes algorithms in prose and mathematical formulations but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Car-333 [29] dataset is composed of 164, 863 images of cars from 333 model categories collected from the internet. Following the experimental protocol [29], we split the dataset into 157, 023 images for training and 7, 840 for testing. Flower-610 dataset contains 61, 771 images of flowers from 610 different flower species and among all collected, 58, 721 images are used for training and 3, 050 for testing. Stanford Online Product [21] dataset is composed of 120, 053 images from 22, 634 online product categories, and is partitioned into 59, 551 images of 11, 318 categories for training and 60, 502 images of 11, 316 categories for testing. Stanford Car-196 [12] dataset is composed of 16, 185 images of cars from 196 model categories. Caltech-UCSD Birds (CUB-200) [25] dataset is composed of 11, 788 images of birds from 200 different species. We train our networks on the Web Face database [31], which is composed of 494, 414 images from 10, 575 identities, and evaluate the quality of embedding networks trained with different metric learning objectives on Labeled Faces in the Wild (LFW) [8] database. |
| Dataset Splits | Yes | We split the dataset into 157, 023 images for training and 7, 840 for testing. Flower-610 dataset... 58, 721 images are used for training and 3, 050 for testing. Stanford Online Product... is partitioned into 59, 551 images of 11, 318 categories for training and 60, 502 images of 11, 316 categories for testing. The first 98 model categories are used for training and the rest for testing. Similarly, we use the first 100 categories for training. We perform 5-fold cross-validation on the training set and report the average performance on the test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Caffe [10]" and "Adam [11]" but does not provide version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We train networks for 40k iterations with 144 examples per batch. This corresponds to 72 pairs per batch for N-pair losses. We use Adam [11] for mini-batch stochastic gradient descent with data augmentation, namely horizontal flips and random crops. For evaluation, we extract a feature vector and compute the cosine similarity for verification. For all our experiments except for the face verification, we use Image Net pretrained Goog Le Net5 [23] for network initialization; for face verification, we use the same network architecture as Casia Net [31] but trained from scratch without the last fully-connected layer for softmax classification. All networks are trained for 240k iterations, while the learning rate is decreased from 0.0003 to 0.0001 and 0.00003 at 160k and 200k iterations, respectively. |