Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics

Authors: Licong Lin, Song Mei

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we develop a new theoretical framework for analyzing data augmentation-based contrastive learning, with a focus on Sim CLR as a representative example. ... We conduct synthetic experiments to compare contrastive learning losses in Section 5.
Researcher Affiliation	Academia	Licong Lin UC Berkeley EMAIL Song Mei UC Berkeley EMAIL
Pseudocode	No	The paper describes methods and proofs using mathematical formulations but does not contain a distinct pseudocode or algorithm block.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufﬁcient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justiﬁcation: we did not run any experiment in this paper.
Open Datasets	Yes	We evaluated the models based on their zero-shot classiﬁcation performance on the Image Net-1k validation set (1000 classes, 500 images per class). ...We use the CLIP model (RN50-quickgelu, which consists of a Res Net-50 image encoder and 12-layer Transformer text encoder) on a 100K subsample of the cc3m-wds dataset [33]...
Dataset Splits	Yes	We evaluated the models based on their zero-shot classiﬁcation performance on the Image Net-1k validation set (1000 classes, 500 images per class).
Hardware Specification	No	The paper describes the experimental setup and parameters but does not specify any particular hardware (GPU/CPU models, memory, etc.) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers like 'Adam' and 'AdamW' but does not specify versions for any programming languages or software libraries used in the implementation.
Experiment Setup	Yes	We set s 10, d 100, n 500, hidden dimension 64, and batch size K 64. The encoder is trained using Adam (learning rate 0.001) for 1000 epochs until convergence. ... We used a batch size of 128 and the Adam W optimizer with weight decay 0.02, and selected the best learning rate via grid search from t3e 5, 1e 4, 3e 4, 1e 3u.