Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective
Authors: Liangliang Shi, Gu Zhang, Haoyu Zhen, Jintao Fan, Junchi Yan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on vision benchmarks show the effectiveness of our derived loss family and the new uniformity term.Table 1. Top-1 classification accuracy (%) of using the proposed IOT-CL loss (without uniformity penalty) evaluated by linear networks when varying the relaxation of constraints in U with 100/200 epochs of training. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, and Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University. Correspondence to: Junchi Yan <yanjunchi@sjtu.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Computing the IOT induced Contrastive loss (IOT-CL) under U(a, b) (under the framework of Sim CLR) |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology is available or provide a link to a code repository. |
| Open Datasets | Yes | We test our loss on CIFAR10, CIFAR-100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Image Net-100 (Deng et al., 2009). |
| Dataset Splits | Yes | Datasets and Pretraining. We test our loss on CIFAR10, CIFAR-100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Image Net-100 (Deng et al., 2009).With all convolutional layers frozen, we first validate the performance of the pretrained models on linear classification. |
| Hardware Specification | Yes | Experiments run on a single RTX-3090 (24GB) GPU and 128G memory, 24 physical CPU with 3.50GHz. |
| Software Dependencies | No | The paper mentions 'Py Torch’s automatic differentiation functions' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | All the models are pretrained on CIFAR10/200 and SVHN with Adam (Kingma & Ba, 2014) for 500 epochs by 3e-4 learning rate with a mini-batch size of 1281, while the pretraining epoch is reduced to 100 for Image Net-100. We set τ = 0.5 for Softmax-based methods.For the pretraining stage, all models are trained with SGD for 100 epochs by 0.03 learning rate with batch size being 128, the momentum and weight decay of SGD are 0.9 and 1e 4 respectively. And the temperature τ is set to 0.07 for softmax based methods. We set the size of memory bank to 4096. The feature dimension is 128 and the momentum of updating the key encoder is 0.999. |