reproducibilityindex.ai

The Implicit Bias of AdaGrad on Separable Data

Authors: Qian Qian, Xiaoyuan Qian

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that the directions of Ada Grad iterates, with a constant step size sufﬁciently small, always converge. We formulate the asymptotic direction as the solution of a quadratic optimization problem. This achieves a theoretical characterization of the implicit bias of Ada Grad, which also provides insights about why and how the factors involved, such as certain intrinsic properties of the dataset, the initialization and the learning rate, affect the implicit bias. We introduce a novel approach to study the bias of Ada Grad. It is mainly based on a geometric estimation on the directions of the updates, which doesn t depend on any calculation on the convergence rate.
Researcher Affiliation	Academia	Qian Qian Department of Statistics Ohio State University Columbus, OH 43210, USA qian.216@osu.edu Xiaoyuan Qian School of Mathematical Sciences Dalian University of Technology Dalian, Liaoning 116024, China xyqian@dlut.edu.cn
Pseudocode	No	The paper presents mathematical equations for the Ada Grad iterates (e.g., equation (1) on page 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or links to a code repository.
Open Datasets	No	The paper mentions a "training dataset" (Let {(xn, yn) : n = 1, , N} be a training dataset) but does not provide access information (e.g., a specific name, link, or citation) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific information about training, validation, or test dataset splits.
Hardware Specification	No	The paper conducts "Numerical simulations" but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., programming languages, libraries, or solvers) with version numbers that would be needed to replicate the experiments.
Experiment Setup	Yes	Given two hyperparameters ϵ , η > 0 and an initial point w(0) Rp, we consider the diagonal Ada Grad iterates... Numerical simulations also reveal the differences among the asymptotic directions of Ada Grad iterates with various learning rates, as shown in Figure 2... with η = 0.1 and 0.5, respectively.