Agnostic Learning of a Single Neuron with Gradient Descent

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper provides a theoretical analysis of gradient descent when used for learning a single neuron. As a theoretical work, its potential risk for negative societal impacts is extremely limited. On the other hand, our general lack of understanding of why gradient descent on large neural networks can find weights that have both small empirical risk and also small population risk is worrisome given the widespread adoption of large neural networks in sensitive technology applications. Our main contributions can be summarized as follows. 1) Agnostic setting (Theorem 3.3). Without any assumptions on the relationship between y and x... we show that for any ε > 0, gradient descent finds a point wt with population risk O(OPT) + ε with sample complexity O(ε 2) and runtime O(ε 1)... 2) Noisy teacher network setting (Theorem 4.1). When y = σ(v x) + ξ... we demonstrate that gradient descent finds wt satisfying F(wt) OPT + ε...
Researcher Affiliation Academia Spencer Frei Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA spencerfrei@ucla.edu Yuan Cao Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA yuancao@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu
Pseudocode No The paper describes the gradient descent update rule mathematically in equation (3.1): 'wt+1 = wt η b F(wt) = wt (η/n) Pn i=1(σ(w t xi) yi)σ (w t xi)xi, (3.1)'. However, this is presented as a mathematical formula, not as structured pseudocode or an algorithm block.
Open Source Code No The paper does not mention releasing code or providing a link to a repository for the described methodology. It is a theoretical paper.
Open Datasets No The paper is theoretical and does not conduct experiments using specific datasets. It refers to 'a set of i.i.d. samples S Dn' as part of its theoretical model, not as an actual dataset used for training that would be publicly available.
Dataset Splits No The paper is theoretical and does not conduct experiments with dataset splits for training, validation, or testing. Therefore, it does not provide specific dataset split information.
Hardware Specification No The paper is purely theoretical and does not describe any experimental setup or the hardware used to run experiments.
Software Dependencies No The paper is purely theoretical and does not describe any experimental setup or list software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not describe a concrete experimental setup with hyperparameters or system-level training settings. It describes the gradient descent algorithm and its theoretical analysis but not an implementation for empirical evaluation.