Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Augmentation-Aware Theory for Self-Supervised Contrastive Learning

Authors: Jingyi Cui, Hongwei Wen, Yisen Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we conduct both pixel- and representation-level experiments to verify our proposed theoretical results. ... We conduct numerical comparisons on the CIFAR-100 and Tiny Imagenet benchmark datasets. For conciseness of presentation, we delay the figures regarding Tiny Imagenet to Appendix C. We follow the experimental settings of Sim CLR (Chen et al., 2020a).
Researcher Affiliation	Academia	1State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente 3Institute for Artificial Intelligence, Peking University. Correspondence to: Yisen Wang <EMAIL>.
Pseudocode	No	The paper presents mathematical formulations, theorems, and proofs but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets	Yes	We conduct numerical comparisons on the CIFAR-100 and Tiny Imagenet benchmark datasets.
Dataset Splits	Yes	We evaluate the self-supervised learned representation through linear probing, i.e., we train a linear classifier on top of the encoder for 100 epochs and report its test accuracy. (Implies standard splits for CIFAR-100 and Tiny Imagenet)
Hardware Specification	Yes	We run all experiments on an NVIDIA Ge Force RTX 3090 24GB GPU.
Software Dependencies	No	The paper mentions following the experimental settings of Sim CLR (Chen et al., 2020a) but does not provide specific version numbers for software dependencies or libraries used.
Experiment Setup	Yes	We set the batch size as 1024 and use 1000 epochs for training representations. We use the SGD optimizer with the learning rate 0.5 decayed at the 700-th, 800-th, and 900-th epochs with a weight decay 0.1.