Agnostic Learning of a Single Neuron with Gradient Descent
Authors: Spencer Frei, Yuan Cao, Quanquan Gu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper provides a theoretical analysis of gradient descent when used for learning a single neuron. As a theoretical work, its potential risk for negative societal impacts is extremely limited. On the other hand, our general lack of understanding of why gradient descent on large neural networks can find weights that have both small empirical risk and also small population risk is worrisome given the widespread adoption of large neural networks in sensitive technology applications. Our main contributions can be summarized as follows. 1) Agnostic setting (Theorem 3.3). Without any assumptions on the relationship between y and x... we show that for any ε > 0, gradient descent finds a point wt with population risk O(OPT) + ε with sample complexity O(ε 2) and runtime O(ε 1)... 2) Noisy teacher network setting (Theorem 4.1). When y = σ(v x) + ξ... we demonstrate that gradient descent finds wt satisfying F(wt) OPT + ε... |
| Researcher Affiliation | Academia | Spencer Frei Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA spencerfrei@ucla.edu Yuan Cao Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA yuancao@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu |
| Pseudocode | No | The paper describes the gradient descent update rule mathematically in equation (3.1): 'wt+1 = wt η b F(wt) = wt (η/n) Pn i=1(σ(w t xi) yi)σ (w t xi)xi, (3.1)'. However, this is presented as a mathematical formula, not as structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not mention releasing code or providing a link to a repository for the described methodology. It is a theoretical paper. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments using specific datasets. It refers to 'a set of i.i.d. samples S Dn' as part of its theoretical model, not as an actual dataset used for training that would be publicly available. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits for training, validation, or testing. Therefore, it does not provide specific dataset split information. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any experimental setup or the hardware used to run experiments. |
| Software Dependencies | No | The paper is purely theoretical and does not describe any experimental setup or list software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe a concrete experimental setup with hyperparameters or system-level training settings. It describes the gradient descent algorithm and its theoretical analysis but not an implementation for empirical evaluation. |