Understanding Contrastive Learning via Distributionally Robust Optimization
Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, Xiangnan He
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various domains (image, sentence, and graphs) validate the effectiveness of the proposal. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2Zhejiang University |
| Pseudocode | Yes | Figure 5: Pseudocode for our proposed adjusted Info NCE objective, as well as the original NCE contrastive objective. |
| Open Source Code | Yes | The code is available at https://github.com/junkangwu/ADNCE. |
| Open Datasets | Yes | We empirically evaluate CL on two benchmark datasets, CIFAR10 and STL10, with the findings detailed in Table 1. |
| Dataset Splits | Yes | Each parameter was repeated from scratch five times, and the best parameter was selected by evaluating on the validation dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud computing instance types) were mentioned for the experimental setup. |
| Software Dependencies | No | The paper mentions software components like "PyTorch" and specific models (e.g., "ResNet-50", "BERT", "RoBERTa"), but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we use the Res Net-50 network as the backbone. To ensure a fair comparison, we set the embedded dimension to 2048 (the representation used in linear readout) and project it into a 128-dimensional space (the actual embedding used for contrastive learning). Regarding the temperature parameter τ, we use the default value τ0 of 0.5 in most researches, and we also perform grid search on τ varying from 0.1 to 1.0 at an interval of 0.1, denoted by τ . We use the Adam optimizer with a learning rate of 0.001 and weight decay of 1e 6. All models are trained for 400 epochs. |