reproducibilityindex.ai

Agnostic Learning of a Single Neuron with Gradient Descent

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper provides a theoretical analysis of gradient descent when used for learning a single neuron. As a theoretical work, its potential risk for negative societal impacts is extremely limited. On the other hand, our general lack of understanding of why gradient descent on large neural networks can ﬁnd weights that have both small empirical risk and also small population risk is worrisome given the widespread adoption of large neural networks in sensitive technology applications. Our main contributions can be summarized as follows. 1) Agnostic setting (Theorem 3.3). Without any assumptions on the relationship between y and x... we show that for any ε > 0, gradient descent ﬁnds a point wt with population risk O(OPT) + ε with sample complexity O(ε 2) and runtime O(ε 1)... 2) Noisy teacher network setting (Theorem 4.1). When y = σ(v x) + ξ... we demonstrate that gradient descent ﬁnds wt satisfying F(wt) OPT + ε...
Researcher Affiliation	Academia	Spencer Frei Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA spencerfrei@ucla.edu Yuan Cao Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA yuancao@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu
Pseudocode	No	The paper describes the gradient descent update rule mathematically in equation (3.1): 'wt+1 = wt η b F(wt) = wt (η/n) Pn i=1(σ(w t xi) yi)σ (w t xi)xi, (3.1)'. However, this is presented as a mathematical formula, not as structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not mention releasing code or providing a link to a repository for the described methodology. It is a theoretical paper.
Open Datasets	No	The paper is theoretical and does not conduct experiments using specific datasets. It refers to 'a set of i.i.d. samples S Dn' as part of its theoretical model, not as an actual dataset used for training that would be publicly available.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with dataset splits for training, validation, or testing. Therefore, it does not provide specific dataset split information.
Hardware Specification	No	The paper is purely theoretical and does not describe any experimental setup or the hardware used to run experiments.
Software Dependencies	No	The paper is purely theoretical and does not describe any experimental setup or list software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not describe a concrete experimental setup with hyperparameters or system-level training settings. It describes the gradient descent algorithm and its theoretical analysis but not an implementation for empirical evaluation.