The Implicit Bias of AdaGrad on Separable Data
Authors: Qian Qian, Xiaoyuan Qian
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the directions of Ada Grad iterates, with a constant step size sufficiently small, always converge. We formulate the asymptotic direction as the solution of a quadratic optimization problem. This achieves a theoretical characterization of the implicit bias of Ada Grad, which also provides insights about why and how the factors involved, such as certain intrinsic properties of the dataset, the initialization and the learning rate, affect the implicit bias. We introduce a novel approach to study the bias of Ada Grad. It is mainly based on a geometric estimation on the directions of the updates, which doesn t depend on any calculation on the convergence rate. |
| Researcher Affiliation | Academia | Qian Qian Department of Statistics Ohio State University Columbus, OH 43210, USA qian.216@osu.edu Xiaoyuan Qian School of Mathematical Sciences Dalian University of Technology Dalian, Liaoning 116024, China xyqian@dlut.edu.cn |
| Pseudocode | No | The paper presents mathematical equations for the Ada Grad iterates (e.g., equation (1) on page 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or links to a code repository. |
| Open Datasets | No | The paper mentions a "training dataset" (Let {(xn, yn) : n = 1, , N} be a training dataset) but does not provide access information (e.g., a specific name, link, or citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific information about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper conducts "Numerical simulations" but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., programming languages, libraries, or solvers) with version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Given two hyperparameters ϵ , η > 0 and an initial point w(0) Rp, we consider the diagonal Ada Grad iterates... Numerical simulations also reveal the differences among the asymptotic directions of Ada Grad iterates with various learning rates, as shown in Figure 2... with η = 0.1 and 0.5, respectively. |