reproducibilityindex.ai

A Precise Characterization of SGD Stability Using Loss Surface Geometry

Authors: Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS In this section, we support our prior theorems by empirically evaluating the behavior of SGD on synthetic optimization problems with additively decomposable loss functions.
Researcher Affiliation	Collaboration	Gregory Dexter1, Borja Ocejo2, Sathiya Keerthi2, Aman Gupta2, Ayan Acharya2 & Rajiv Khanna1 1 Purdue University 2 Linked In Corporation
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	To ensure reproducibility, we include all our implementations in the supplementary material.
Open Datasets	No	The paper uses 'synthetic optimization problems' and a 'construction used in the proof of Theorem 2' for its experiments, implying a generated dataset without providing concrete access information or referring to a well-known public dataset.
Dataset Splits	No	The paper conducts experiments on synthetic data to verify theoretical predictions about SGD divergence, but it does not involve traditional dataset splits for training, validation, or testing. The focus is on the behavior of SGD under specific conditions rather than model generalization on empirical datasets.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or solvers.
Experiment Setup	Yes	In this construction, we set Hi = m e1e T 1 for all i [σ] and Hi = m ei σ+1e T i σ+1 otherwise, with m = 2n/σ. We set the dimension of the space to n - σ +1... Across all experiments, we set n = 100. For each set of parameters (B, η, σ), we determine whether the combination leads to divergence or not by executing SGD for a maximum of 1000 steps. Specifically, we classify a tuple as divergent if, in the majority of the five repetitions, the norm of the parameter vector w increases by a factor of 1000.