Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Traceability in $\ell_p$ Stochastic Convex Optimization

Authors: Sasha Voitovych, Mahdi Haghifam, Idan Attias, Gintare Karolina Dziugaite, Roi Livni, Dan Roy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we investigate the necessity of traceability for accurate learning in stochastic convex optimization (SCO) under ℓp geometries. Informally, we say a learning algorithm is m-traceable if, by analyzing its output, it is possible to identify at least m of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every p [1, ), we establish the existence of an excess risk threshold below which every sample-efficient learner is traceable with the number of samples which is a constant fraction of its training sample. For p [1, 2], this threshold coincides with the best excess risk of differentially private (DP) algorithms, i.e., above this threshold, there exist algorithms that are not traceable, which corresponds to a sharp phase transition. For p (2, ), this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route to establishing these results, we prove a sparse variant of the fingerprinting lemma, which is of independent interest to the community.
Researcher Affiliation	Collaboration	Institute for Data, Systems, and Society, Massachusetts Institute of Technology Khoury College of Computer Sciences, Northeastern University University of Illinois at Chicago; Toyota Technological Institute at Chicago Google Deep Mind Department of Electrical Engineering, Tel Aviv University Department of Statistical Sciences, University of Toronto; Vector Institute
Pseudocode	No	The paper describes theoretical concepts and proof roadmaps, but does not present any structured pseudocode or algorithm blocks. For example, Section 2.3 'Roadmap of the proof' outlines technical elements conceptually without pseudocode.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described. It is a theoretical paper without empirical experiments that would typically involve code release.
Open Datasets	No	The paper is theoretical and defines problem setups using mathematical constructs like 'Z is the data space' or 'Zk = {z {0, 1}d : z 0 = k}'. It does not mention any specific publicly available datasets (e.g., CIFAR, MNIST) or provide links/citations to actual dataset repositories used in experiments.
Dataset Splits	No	As the paper is purely theoretical and does not conduct experiments with actual datasets, there is no mention of dataset splits such as training, validation, or test sets.
Hardware Specification	No	The paper is theoretical and does not report on any empirical experiments. Consequently, no specific hardware specifications (e.g., GPU models, CPU types, memory) used for running experiments are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe computational experiments. Therefore, it does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, CUDA versions, or solver names) that would be needed to replicate experimental results.
Experiment Setup	No	The paper is theoretical and does not involve empirical experiments. As such, there are no details provided regarding experimental setup, hyperparameter values, model initialization, training schedules, or optimizer settings.