Deep Causal Metric Learning

Authors: Xiang Deng, Zhongfei Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several benchmark datasets demonstrate the superiority of DCML over the existing methods 1. 4. Experiments In this section, we first introduce the experimental settings, then report the comparison results between DCML and the state-of-the-art (SOTA) approaches, and finally present the ablation studies and qualitative results.
Researcher Affiliation Academia 1Department of Computer Science, State University of New York at Binghamton, NY, US. Correspondence to: Xiang Deng <xdeng7@binghamton.edu>.
Pseudocode Yes Algorithm 1 DCML Input: Training data D, Encoder f, Attention MLP Tθ. for i = 1 to N epochs do if i%e == 0 then for i = 1 to M do Update sample weights (environments) with (11) end for end if Update the the model parameters by minimizing (10) end for
Open Source Code Yes 1Code: https://github.com/Xiang-Deng-DL/DCML. Their final values on each dataset are given in the Github repository: https://github.com/Xiang-Deng-DL/DCML.
Open Datasets Yes Datasets. Following the existing literature, we adopt the three widely used metric learning benchmark datasets, i.e., CUB-200 (Wah et al., 2011), Cars-196 (Krause et al., 2013), and Stanford Online Products (SOP) (Oh Song et al., 2016).
Dataset Splits Yes 4-fold cross validation on the first half of the classes in each dataset is used for training the model. Specifically, the first half of classes are divided into 4 partitions deterministically. 3 of the 4 partitions are used as the training dataset and the remaining 1 as the validation dataset for tuning the hyper-parameters.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions optimizers (RMSprop) and backbone architectures (BN-Inception), but does not provide specific version numbers for software dependencies like programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup Yes DCML has 3 hyper-parameters, i.e., α, β, and γ. Instead of using grid search that is time-consuming, we do a very simple search. We first fix α and γ to 0, and tune β in [0, 1]. After the optimal β is obtained, we fix β and α and search the optimal γ in [0, 1]. Finally, we search α while fixing β and γ. Their final values on each dataset are given in the Github repository: https://github.com/Xiang-Deng-DL/DCML. For the hyper-parameters in the proxy loss, we set them to the values searched by (Musgrave et al., 2020). The model is trained with optimizer RMSprop. The learning rates for the backbone and the attention net are set to 1e-6 and 2e-6, respectively. The learning rate for the class proxy vectors are set to the values searched by (Musgrave et al., 2020), i.e., 2.53e-3, 7.41e-3, and 2.16e-3 on CUB-200, Cars-196, and SOP, respectively.