Robustly Learning a Single Neuron via Sharpness
Authors: Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of learning a single neuron with respect to the L2 2-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including Re LUs, approximates the optimal L2 2error within a constant factor. Notably, our algorithm succeeds under much milder distributional assumptions compared to prior work. The key ingredient enabling our results is a novel connection to local error bounds from optimization theory. Our algorithm is extremely simple: it amounts to mini-batch SGD on a natural convex surrogate of the problem. As we will explain subsequently, this convex surrogate has been studied before in closely related yet more restricted contexts. Our main technical contribution lies in the analysis, which hinges on a new connection to local error bounds from the theory of optimization. |
| Researcher Affiliation | Academia | 1Department of Computer-Sciences, University of Wisconsin-Madison, Madison, USA. Correspondence to: Nikos Zarifis <zarifis@wisc.edu>. |
| Pseudocode | Yes | Algorithm 1 Stochastic Gradient Descent on Surrogate Loss (Page 5) and Algorithm 2 Stochastic Gradient Descent on Surrogate Loss (Appendix D.2). |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper discusses theoretical properties of 'unknown distribution D' and various distribution classes (e.g., 'heavy-tailed data', 'discrete Gaussians', 'uniform distribution over the cube'), but it does not specify or provide access information (e.g., link, DOI, formal citation) for any publicly available or open dataset used for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not present empirical experiments that would involve specifying training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe running experiments. Therefore, it does not provide any specific hardware specifications (e.g., GPU/CPU models, memory details) used for computational tasks. |
| Software Dependencies | No | The paper is theoretical and presents algorithmic analysis without detailing an implementation. Therefore, it does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9'). |
| Experiment Setup | Yes | Algorithm 1 Stochastic Gradient Descent on Surrogate Loss Input: Iterations: T, sample access from D, batch size N, step size η, bound M. Then after T = eΘ (B2α2/(ρ2µ2)) log (W/ϵ) iterations with batch size N = Ω(d T(r2 ϵ +α2M 2)), Algorithm 1 converges to a point w(T ) such that LD,σ 2 (w(T )) = O (B2α2/(ρ2µ2)) OPT+ ϵ , with probability at least 2/3. |