Understanding Contrastive Learning via Distributionally Robust Optimization

Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Xiang Wang, Xiangnan He

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various domains (image, sentence, and graphs) validate the effectiveness of the proposal.
Researcher Affiliation Academia 1University of Science and Technology of China 2Zhejiang University
Pseudocode Yes Figure 5: Pseudocode for our proposed adjusted Info NCE objective, as well as the original NCE contrastive objective.
Open Source Code Yes The code is available at https://github.com/junkangwu/ADNCE.
Open Datasets Yes We empirically evaluate CL on two benchmark datasets, CIFAR10 and STL10, with the findings detailed in Table 1.
Dataset Splits Yes Each parameter was repeated from scratch five times, and the best parameter was selected by evaluating on the validation dataset.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud computing instance types) were mentioned for the experimental setup.
Software Dependencies No The paper mentions software components like "PyTorch" and specific models (e.g., "ResNet-50", "BERT", "RoBERTa"), but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Specifically, we use the Res Net-50 network as the backbone. To ensure a fair comparison, we set the embedded dimension to 2048 (the representation used in linear readout) and project it into a 128-dimensional space (the actual embedding used for contrastive learning). Regarding the temperature parameter τ, we use the default value τ0 of 0.5 in most researches, and we also perform grid search on τ varying from 0.1 to 1.0 at an interval of 0.1, denoted by τ . We use the Adam optimizer with a learning rate of 0.001 and weight decay of 1e 6. All models are trained for 400 epochs.