Deep Causal Metric Learning
Authors: Xiang Deng, Zhongfei Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several benchmark datasets demonstrate the superiority of DCML over the existing methods 1. 4. Experiments In this section, we first introduce the experimental settings, then report the comparison results between DCML and the state-of-the-art (SOTA) approaches, and finally present the ablation studies and qualitative results. |
| Researcher Affiliation | Academia | 1Department of Computer Science, State University of New York at Binghamton, NY, US. Correspondence to: Xiang Deng <xdeng7@binghamton.edu>. |
| Pseudocode | Yes | Algorithm 1 DCML Input: Training data D, Encoder f, Attention MLP Tθ. for i = 1 to N epochs do if i%e == 0 then for i = 1 to M do Update sample weights (environments) with (11) end for end if Update the the model parameters by minimizing (10) end for |
| Open Source Code | Yes | 1Code: https://github.com/Xiang-Deng-DL/DCML. Their final values on each dataset are given in the Github repository: https://github.com/Xiang-Deng-DL/DCML. |
| Open Datasets | Yes | Datasets. Following the existing literature, we adopt the three widely used metric learning benchmark datasets, i.e., CUB-200 (Wah et al., 2011), Cars-196 (Krause et al., 2013), and Stanford Online Products (SOP) (Oh Song et al., 2016). |
| Dataset Splits | Yes | 4-fold cross validation on the first half of the classes in each dataset is used for training the model. Specifically, the first half of classes are divided into 4 partitions deterministically. 3 of the 4 partitions are used as the training dataset and the remaining 1 as the validation dataset for tuning the hyper-parameters. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers (RMSprop) and backbone architectures (BN-Inception), but does not provide specific version numbers for software dependencies like programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA. |
| Experiment Setup | Yes | DCML has 3 hyper-parameters, i.e., α, β, and γ. Instead of using grid search that is time-consuming, we do a very simple search. We first fix α and γ to 0, and tune β in [0, 1]. After the optimal β is obtained, we fix β and α and search the optimal γ in [0, 1]. Finally, we search α while fixing β and γ. Their final values on each dataset are given in the Github repository: https://github.com/Xiang-Deng-DL/DCML. For the hyper-parameters in the proxy loss, we set them to the values searched by (Musgrave et al., 2020). The model is trained with optimizer RMSprop. The learning rates for the backbone and the attention net are set to 1e-6 and 2e-6, respectively. The learning rate for the class proxy vectors are set to the values searched by (Musgrave et al., 2020), i.e., 2.53e-3, 7.41e-3, and 2.16e-3 on CUB-200, Cars-196, and SOP, respectively. |