Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings

Authors: Nong Minh Hieu, Antoine Ledent

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we describe the numerical experiments on both synthetic and open sourced datasets to empirically verify our main results. Specifically, we aim to provide empirical evidence to corroborate three hypotheses. ... We summarize our experiment results in Figure 3 and Figure 2.
Researcher Affiliation	Academia	1School of Computing and Information Systems, Singapore Management University. Correspondence to: Nong Minh Hieu <EMAIL>.
Pseudocode	No	The paper describes methods and problem formulations using mathematical notation and text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about the release of source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	In this section, we describe the numerical experiments on both synthetic and open sourced datasets to empirically verify our main results. Specifically, we aim to provide empirical evidence to corroborate three hypotheses. ... In this section, we describe two experiments conducted on the MNIST dataset.
Dataset Splits	No	The paper mentions conducting experiments on the MNIST dataset and synthetic data, and details the number of tuples used for training (n=100 or n=10000, and M sub-sampled tuples). However, it does not specify how these datasets are split into training, validation, or test sets for evaluating model performance.
Hardware Specification	No	The paper describes training neural networks for experiments but does not provide specific hardware details such as GPU or CPU models, or cloud computing specifications.
Software Dependencies	No	The paper does not list any specific software dependencies or their version numbers required to reproduce the experiments.
Experiment Setup	No	The paper mentions using a 'shallow neural network (with a fixed number of layers L 2)' and training with '5 random weights initializations'. It also describes synthetic data generation parameters like 'fixed standard deviation of σ 0.1'. However, it does not provide key hyperparameters for training, such as learning rate, batch size, optimizer, or number of epochs, which are crucial for reproducing the experimental setup.