Do Generated Data Always Help Contrastive Learning?

Authors: Yifei Wang, Jizhe Zhang, Yisen Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed approach improves downstream accuracy significantly at no extra cost, and it is particularly beneficial for data-scarce scenarios.
Researcher Affiliation Academia 1 School of Mathematical Sciences, Peking University 2 Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 3 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 4 Institute for Artificial Intelligence, Peking University
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes Code is available at https://github.com/PKU-ML/adainf.
Open Datasets Yes We conduct experiments on three benchmark datasets: CIFAR-10, CIFAR-100, and Tiny Image Net.
Dataset Splits No The paper does not explicitly provide details about training/validation/test splits or mention the use of a validation set for hyperparameter tuning. It focuses on training and testing data.
Hardware Specification Yes All of the models are pretrained with 4 NVIDIA GTX 3090 GPUs.
Software Dependencies No The paper mentions using the "solo-learn library (da Costa et al., 2022)" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes For a fair comparison of inflated and non-inflated training, we train the model for 100k steps in all cases... Specifically, we weaken two most important augmentations: the min scale of random resized cropping improves from 0.08 to 0.2; the Color Jitter strength decreases from 1 to 0.5; and the probability of applying Color Jitter decreases from 0.8 to 0.4.