Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Memorization Estimation: Fast, Formal and Free

Authors: Deepak Ravikumar, Efstathia Soufleri, Abolfazl Hashemi, Kaushik Roy

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theory with experiments and show that the proposed proxy has a very high cosine similarity with the memorization score from (Feldman & Zhang, 2020). We validate our theory through experiments on deep vision models, demonstrating the efficacy of CSL as a strong memorization proxy. We showcase the practical applications of our proxy in identifying mislabeled examples and duplicates in datasets, achieving state-of-the-art performance in these tasks.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Purdue University, West Lafayette, U.S.A. Correspondence to: Deepak Ravikumar <EMAIL>.
Pseudocode	No	The paper contains theoretical definitions, theorems, and proofs but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured step-by-step procedures formatted like code.
Open Source Code	Yes	Link to the implementation: https://github.com/ Deepak Tatachar/CSL-Mem. To improve reproducibility, we have provided the code for all the experiments at https://github.com/ Deepak Tatachar/CSL-Mem.
Open Datasets	Yes	Datasets. We use CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) datasets.
Dataset Splits	No	The paper mentions using CIFAR-10, CIFAR-100, and Image Net datasets for training and testing but does not explicitly provide details about the training, validation, and test splits (e.g., percentages, sample counts, or specific methodology) for their own experiments. While standard benchmarks often have predefined splits, the paper does not specify which standard splits were used or how they were applied in their context.
Hardware Specification	No	The paper describes training models and conducting experiments but does not provide specific details about the hardware used, such as GPU or CPU models, memory, or other computing resources.
Software Dependencies	No	The paper mentions using specific machine learning architectures (Res Net18) and a library (cleanlab) but does not provide explicit version numbers for programming languages, machine learning frameworks (e.g., PyTorch, TensorFlow), or other key software dependencies.
Experiment Setup	Yes	Training. When training models on CIFAR-10 and CIFAR-100 the initial learning rate was set to 0.1 and trained for 200 epochs. The learning rate is decreased by 10 at epochs 120 and 180. When training on CIFAR-10 and CIFAR-100 datasets the batch size is set to 128. We use stochastic gradient descent for training with momentum set to 0.9 and weight decay set to 1e-4. For both CIFAR-10 and CIFAR-100 datasets, we used the following sequence of data augmentations for training: resize (32 32), random crop, and random horizontal flip, this is followed by normalization. For Image Net we trained a Res Net18 for 200 epochs with the same setting except the resize random crop was set to 224 224.