It Takes Two to Tango: Mixup for Deep Metric Learning
Authors: Shashanka Venkataramanan, Bill Psomas, Ewa Kijak, laurent amsaleg, Konstantinos Karantzalos, Yannis Avrithis
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effect of improved representations, we show that mixing inputs, intermediate representations or embeddings along with target labels significantly outperforms state-of-the-art metric learning methods on four benchmark deep metric learning datasets. |
| Researcher Affiliation | Academia | 1Inria, Univ Rennes, CNRS, IRISA 2Athena RC 3National Technical University of Athens |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found. |
| Open Source Code | No | The paper adapts an 'official code' from another work (https://github.com/navervision/proxy-synthesis) but does not provide a statement or link for the open-source code of their own methodology (Metrix). |
| Open Datasets | Yes | We experiment on Caltech-UCSD Birds (CUB200) (Wah et al., 2011), Stanford Cars (Cars196) (Krause et al., 2013), Stanford Online Products (SOP) (Oh Song et al., 2016) and In-Shop Clothing retrieval (In-Shop) (Liu et al., 2016) image datasets. |
| Dataset Splits | No | The paper provides statistics for training and testing images (e.g., Table 4: '# training images 5,894', '# testing images 5,894' for CUB200), but does not explicitly mention a separate validation set or its split details. |
| Hardware Specification | Yes | On CUB200 dataset, using a batch size of 100 on an NVIDIA RTX 2080 Ti GPU, the average training time in ms/batch is 586 for MS and 817 for MS+Metrix. |
| Software Dependencies | No | The paper mentions using 'Adam W (Loshchilov & Hutter, 2019) optimizer' but does not specify versions for any programming languages, libraries, or other software components. |
| Experiment Setup | Yes | We train R-50 using Adam W (Loshchilov & Hutter, 2019) optimizer for 100 epochs with a batch size 100. The initial learning rate per dataset is shown in Table 4. The learning rate is decayed by 0.1 for Cont and by 0.5 for MS and PA on CUB200 and Cars196. For SOP and In-Shop, we decay the learning rate by 0.25 for all losses. The weight decay is set to 0.0001. |