reproducibilityindex.ai

Do Generated Data Always Help Contrastive Learning?

Authors: Yifei Wang, Jizhe Zhang, Yisen Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the proposed approach improves downstream accuracy signiﬁcantly at no extra cost, and it is particularly beneﬁcial for data-scarce scenarios.
Researcher Affiliation	Academia	1 School of Mathematical Sciences, Peking University 2 Institute of Artiﬁcial Intelligence and Robotics, Xi an Jiaotong University 3 National Key Lab of General Artiﬁcial Intelligence, School of Intelligence Science and Technology, Peking University 4 Institute for Artiﬁcial Intelligence, Peking University
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	Code is available at https://github.com/PKU-ML/adainf.
Open Datasets	Yes	We conduct experiments on three benchmark datasets: CIFAR-10, CIFAR-100, and Tiny Image Net.
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test splits or mention the use of a validation set for hyperparameter tuning. It focuses on training and testing data.
Hardware Specification	Yes	All of the models are pretrained with 4 NVIDIA GTX 3090 GPUs.
Software Dependencies	No	The paper mentions using the "solo-learn library (da Costa et al., 2022)" but does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	For a fair comparison of inﬂated and non-inﬂated training, we train the model for 100k steps in all cases... Speciﬁcally, we weaken two most important augmentations: the min scale of random resized cropping improves from 0.08 to 0.2; the Color Jitter strength decreases from 1 to 0.5; and the probability of applying Color Jitter decreases from 0.8 to 0.4.