reproducibilityindex.ai

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classiﬁcation via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. Figure 1: Matrix PCA summary statistics in dim. n = 1500 run for 10n steps at λ = 0.8 < λc in (a) (b) and λ = 1.2 > λc in (c) (d). Here, and mark the stable ﬁxed points of the systems. (a) and (c) demonstrate the stable and unstable OU processes that arise as diffusive limits of the m variable, and (b) and (d) depict the trajectories in (m, r2). If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Researcher Affiliation	Academia	Gérard Ben Arous Department of Mathematics Courant Institute New York University New York, NY benarous@cims.nyu.edu Reza Gheissari Miller Institute Departments of EECS and Statistics University of California, Berkeley Berkely, CA gheissari@berkeley.edu Aukosh Jagannath Department of Statistics and Actuarial Science Department of Applied Mathematics University of Waterloo Waterloo, ON a.jagannath@uwaterloo.ca
Pseudocode	No	The paper describes mathematical formulations and processes but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material
Open Datasets	No	The paper uses data generated from theoretical models (spiked matrix models, Gaussian mixture models) rather than established, publicly available datasets. It describes the data generating process ('Y = (y, X), where y is a Ber(1/2) random variables and, conditionally on y, we have X N((2y 1)µ, I/λ)'). No concrete access information (link, DOI, etc.) to an external, publicly available dataset is provided.
Dataset Splits	No	The paper describes experimental results based on theoretical models but does not specify training, validation, or test data splits in terms of percentages or sample counts for any external dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific solvers).
Experiment Setup	Yes	We consider online stochastic gradient descent with constant learning rate, δn, which is given by X = X 1 δnr Ln(X 1, Y ). The critical scaling for δ is then of order (1/n) and we obtain the following effective dynamics. Proposition 4.1. Let un be as in (4.2) and ﬁx any λ > 0 and δn = cδ/N. Then un(t) converges to the solution of the ODE system, ut = f(ut) + g(ut), initialized from limn!1(un) µn, with: For large λ, this is indeed a quantitative approximation as f, g exhibit locally Lipschitz dependence on λ 1, so the corresponding dynamics converges as λ ! 1 by classical well-posedness results (see, e.g., [48]). Figure 3: XOR GMM in dim. N = 250 with λ = 1000 and = 0.1.