Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Power of Contrast for Feature Learning: A Theoretical Analysis

Authors: Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou, Linjun Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite its empirical success, theoretical understanding of the superiority of contrastive learning is still limited. In this paper, under linear representation settings, (i) we provably show that contrastive learning outperforms the standard autoencoders and generative adversarial networks... We verify our theory with numerical experiments.
Researcher Affiliation	Academia	Wenlong Ji EMAIL Department of Statistics Stanford University Stanford, CA 94305, USA Zhun Deng EMAIL Department of Computer Science Columbia University New York, NY 10027, USA Ryumei Nakada EMAIL Department of Statistics Rutgers University Piscataway, NJ 08854, USA James Zou EMAIL Department of Biomedical Data Science Stanford University Stanford, CA 94305, USA Linjun Zhang EMAIL Department of Statistics Rutgers University Piscataway, NJ 08854, USA
Pseudocode	No	The paper describes methodologies through mathematical formulations and prose, such as the formulation of contrastive loss functions in Equations (1), (2), (3), (4) and the data generating process in Equation (5), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Our codes are implemented in Pytorch and run on an NVIDIA V100 GPU. No explicit statement of code release or repository link is provided for the methodology described in this paper.
Open Datasets	Yes	We conduct the experiments using the datasets STL-10 (Coates et al., 2011) and CIFAR10 (Krizhevsky, 2009) with the neural nets architecture Res Net-18 (He et al., 2016).
Dataset Splits	Yes	For both STL-10 and CIFAR-10 datasets, we divide the test data into two sets, one consists of the ﬁrst ﬁve classes and the other one consists of the remaining ﬁve classes. During training, we use the training data as unlabeled data and the ﬁrst set of test data as the labeled data to train the model jointly, and then train a linear classiﬁer with the second set of test data on features learned by the encoder.
Hardware Specification	Yes	Our codes are implemented in Pytorch and run on an NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions 'Pytorch' as the implementation framework and 'Adam optimizer (Kingma and Ba, 2015)', but specific version numbers for these software components are not provided.
Experiment Setup	Yes	All training is carried out with the Adam optimizer (Kingma and Ba, 2015), batch size 256, learning rate 3 10 4, weight decay 10 4, and a cosine annealing learning rate scheduler for 100 epochs.