Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

Authors: Navid Azizan, Babak Hassibi

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we present an alternative explanation of the behavior of SGD, and more generally, the stochastic mirror descent (SMD) family of algorithms, which includes SGD as a special case. We do so by obtaining a fundamental identity for such algorithms (see Lemmas 2 and 5). Using these identities, we show that for general nonlinear models and general loss functions, when the step size is sufficiently small, SMD (and therefore also SGD) is the optimal solution of a certain minimax filtering (or online learning) problem.
Researcher Affiliation Academia Navid Azizan California Institute of Technology Pasadena, CA 91125 azizan@caltech.edu Babak Hassibi California Institute of Technology Pasadena, CA 91125 hassibi@caltech.edu
Pseudocode No The paper describes algorithms through mathematical equations for updates (e.g., Eq. 3, 13, 15) but does not contain a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any statement about releasing source code for the methodology described, nor does it include any links to code repositories.
Open Datasets No The paper defines a 'training dataset' conceptually with notation {(xi, yi) : i = 1, . . . , n} but does not provide concrete access information such as a specific link, DOI, repository name, or formal citation for any publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its analysis or illustrative examples.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate any computational aspects.
Experiment Setup No The paper discusses 'step size η' and mentions specific values in an illustrative example, but it does not provide comprehensive experimental setup details, such as concrete hyperparameter values, training configurations, or system-level settings, for reproducible experiments.