Rethinking Weak Supervision in Helping Contrastive Learning
Authors: Jingyi Cui, Weiran Huang, Yifei Wang, Yisen Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We for the first time establish a theoretical framework for contrastive learning under weak supervision, including noisy label learning and semi-supervised learning. By formulating the label information into a similarity graph based on the posterior probability of labels, we derive the downstream error bound of jointly trained contrastive learning losses. We prove that semisupervised labels improve the downstream error bound compared with unsupervised learning, whereas under the noisy-labeled setting, joint training fails to improve the error bound compared with the winner of supervised and unsupervised contrastive learning. We empirically verify that noisy labels have only limited help to contrastive representation learning under the paradigm of joint training. |
| Researcher Affiliation | Collaboration | 1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Qing Yuan Research Institute, Shanghai Jiao Tong University 3Huawei Noah s Ark Lab 4School of Mathematical Sciences, Peking University 5Institute for Artificial Intelligence, Peking University. |
| Pseudocode | Yes | We show the procedures for semi-supervised contrastive learning in Algorithm 1. Algorithm 1 Joint Training for Semi-supervised Learning |
| Open Source Code | No | No explicit statement providing concrete access to source code for the methodology, such as a repository link or a general code release statement, was found. |
| Open Datasets | Yes | We conduct numerical comparisons on the CIFAR-10 and Tiny Image Net-200 benchmark datasets. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and Tiny ImageNet-200 and evaluating on 'clean testing data' via linear probing. However, it does not explicitly specify the training/validation/test splits (percentages, counts, or specific predefined split citations) used for reproducibility. |
| Hardware Specification | Yes | We run experiments on 4 NVIDIA Tesla V100 32GB GPUs. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library or framework versions like PyTorch 1.9, Python 3.8) were explicitly mentioned in the paper. |
| Experiment Setup | Yes | Specifically, we use Res Net-50 as the encoder and a 2-layer MLP as the projection head. We set the batch size as 1024. We use 1000 epochs for training representations. We use the SGD optimizer with the learning rate 0.5 decayed at the 700-th, 800-th, and 900-th epochs with a weight decay 0.1. We run experiments on 4 NVIDIA Tesla V100 32GB GPUs. |