Robustly Learning a Single Neuron via Sharpness

Authors: Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of learning a single neuron with respect to the L2 2-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including Re LUs, approximates the optimal L2 2error within a constant factor. Notably, our algorithm succeeds under much milder distributional assumptions compared to prior work. The key ingredient enabling our results is a novel connection to local error bounds from optimization theory. Our algorithm is extremely simple: it amounts to mini-batch SGD on a natural convex surrogate of the problem. As we will explain subsequently, this convex surrogate has been studied before in closely related yet more restricted contexts. Our main technical contribution lies in the analysis, which hinges on a new connection to local error bounds from the theory of optimization.
Researcher Affiliation Academia 1Department of Computer-Sciences, University of Wisconsin-Madison, Madison, USA. Correspondence to: Nikos Zarifis <zarifis@wisc.edu>.
Pseudocode Yes Algorithm 1 Stochastic Gradient Descent on Surrogate Loss (Page 5) and Algorithm 2 Stochastic Gradient Descent on Surrogate Loss (Appendix D.2).
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets No The paper discusses theoretical properties of 'unknown distribution D' and various distribution classes (e.g., 'heavy-tailed data', 'discrete Gaussians', 'uniform distribution over the cube'), but it does not specify or provide access information (e.g., link, DOI, formal citation) for any publicly available or open dataset used for training or evaluation.
Dataset Splits No The paper is theoretical and does not present empirical experiments that would involve specifying training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe running experiments. Therefore, it does not provide any specific hardware specifications (e.g., GPU/CPU models, memory details) used for computational tasks.
Software Dependencies No The paper is theoretical and presents algorithmic analysis without detailing an implementation. Therefore, it does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes Algorithm 1 Stochastic Gradient Descent on Surrogate Loss Input: Iterations: T, sample access from D, batch size N, step size η, bound M. Then after T = eΘ (B2α2/(ρ2µ2)) log (W/ϵ) iterations with batch size N = Ω(d T(r2 ϵ +α2M 2)), Algorithm 1 converges to a point w(T ) such that LD,σ 2 (w(T )) = O (B2α2/(ρ2µ2)) OPT+ ϵ , with probability at least 2/3.