Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Implicit Bias of AdaGrad on Separable Data
Authors: Qian Qian, Xiaoyuan Qian
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the directions of Ada Grad iterates, with a constant step size sufficiently small, always converge. We formulate the asymptotic direction as the solution of a quadratic optimization problem. This achieves a theoretical characterization of the implicit bias of Ada Grad, which also provides insights about why and how the factors involved, such as certain intrinsic properties of the dataset, the initialization and the learning rate, affect the implicit bias. We introduce a novel approach to study the bias of Ada Grad. It is mainly based on a geometric estimation on the directions of the updates, which doesn t depend on any calculation on the convergence rate. |
| Researcher Affiliation | Academia | Qian Qian Department of Statistics Ohio State University Columbus, OH 43210, USA EMAIL Xiaoyuan Qian School of Mathematical Sciences Dalian University of Technology Dalian, Liaoning 116024, China EMAIL |
| Pseudocode | No | The paper presents mathematical equations for the Ada Grad iterates (e.g., equation (1) on page 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or links to a code repository. |
| Open Datasets | No | The paper mentions a "training dataset" (Let {(xn, yn) : n = 1, , N} be a training dataset) but does not provide access information (e.g., a specific name, link, or citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific information about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper conducts "Numerical simulations" but does not specify any hardware details (e.g., CPU, GPU models, memory) used for these simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., programming languages, libraries, or solvers) with version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Given two hyperparameters ϵ , η > 0 and an initial point w(0) Rp, we consider the diagonal Ada Grad iterates... Numerical simulations also reveal the differences among the asymptotic directions of Ada Grad iterates with various learning rates, as shown in Figure 2... with η = 0.1 and 0.5, respectively. |