Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization
Authors: Navid Azizan, Babak Hassibi
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present an alternative explanation of the behavior of SGD, and more generally, the stochastic mirror descent (SMD) family of algorithms, which includes SGD as a special case. We do so by obtaining a fundamental identity for such algorithms (see Lemmas 2 and 5). Using these identities, we show that for general nonlinear models and general loss functions, when the step size is sufficiently small, SMD (and therefore also SGD) is the optimal solution of a certain minimax filtering (or online learning) problem. |
| Researcher Affiliation | Academia | Navid Azizan California Institute of Technology Pasadena, CA 91125 azizan@caltech.edu Babak Hassibi California Institute of Technology Pasadena, CA 91125 hassibi@caltech.edu |
| Pseudocode | No | The paper describes algorithms through mathematical equations for updates (e.g., Eq. 3, 13, 15) but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement about releasing source code for the methodology described, nor does it include any links to code repositories. |
| Open Datasets | No | The paper defines a 'training dataset' conceptually with notation {(xi, yi) : i = 1, . . . , n} but does not provide concrete access information such as a specific link, DOI, repository name, or formal citation for any publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce data partitioning for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its analysis or illustrative examples. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate any computational aspects. |
| Experiment Setup | No | The paper discusses 'step size η' and mentions specific values in an illustrative example, but it does not provide comprehensive experimental setup details, such as concrete hyperparameter values, training configurations, or system-level settings, for reproducible experiments. |