Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions. ... For an illustration of this discussion, see Figures 2.1 2.2 for numerical experiments in the supervised learning setting.
Researcher Affiliation	Academia	Gerard Ben Arous EMAIL Courant Institute of Mathematical Sciences New York University New York, NY, USA; Reza Gheissari EMAIL Departments of Statistics and EECS University of California Berkeley, CA, USA; Aukosh Jagannath EMAIL Departments of Statistics and Actuarial Science and Applied Mathematics University of Waterloo Waterloo, ON, Canada
Pseudocode	Yes	Let Xt denote the output of the algorithm at time t, and let δ > 0 denote a step size parameter. The sequence of outputs of the algorithm are then given by the following procedure: X0 = x0 Xt = Xt 1 δ N LN(Xt 1; Y t) Xt = Xt \|\| Xt\|\|
Open Source Code	No	The paper discusses various algorithms and their performance but does not provide any specific links to code repositories or state that code is made available in supplementary materials.
Open Datasets	No	The paper primarily uses synthetic data models, for example, assuming features (aℓ) are i.i.d. standard Gaussian vectors. No specific publicly available datasets with access information are mentioned.
Dataset Splits	No	The paper focuses on theoretical analysis and numerical experiments with synthetically generated data, and thus does not specify training/test/validation dataset splits. It discusses sample complexity 'M i.i.d. samples'.
Hardware Specification	No	The paper mentions 'numerical experiments' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for these experiments.
Software Dependencies	No	The paper describes mathematical models and algorithms but does not specify any software libraries, programming languages, or their versions used for implementation or analysis.
Experiment Setup	Yes	The algorithm uses a 'step size parameter δ > 0' and the initial point 'x0 is possibly random, x0 µ M1(SN 1)'. In numerical experiments, it specifies 'N = 3000 and α = 100' or 'α = 30,000' and discusses 'random starts' and 'warm start (m0 = 0.5)'.