Non-Convex SGD Learns Halfspaces with Adversarial Label Noise

Authors: Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of agnostically learning homogeneous halfspaces in the distribution-specific PAC model. For a broad family of structured distributions, including log-concave distributions, we show that non-convex SGD efficiently converges to a solution with misclassification error O(opt) + ϵ, where opt is the misclassification error of the best-fitting halfspace. In sharp contrast, we show that optimizing any convex surrogate inherently leads to misclassification error of ω(opt), even under Gaussian marginals.
Researcher Affiliation Academia Ilias Diakonikolas University of Wisconsin-Madison ilias@cs.wisc.edu Vasilis Kontonis University of Wisconsin-Madison ilias@cs.wisc.edu Christos Tzamos University of Wisconsin-Madison tzamos@wisc.edu Nikos Zarifis University of Wisconsin-Madison zarifis@wisc.edu
Pseudocode Yes Algorithm 1 PSGD for f(w) = Ez D[g(z, w)]
Open Source Code No The paper does not provide any statement or link for open-source code.
Open Datasets No The paper discusses theoretical distributions (e.g., log-concave, Gaussian) and concepts, but does not mention the use of any specific publicly available datasets for empirical training or evaluation, nor does it provide access information for any data.
Dataset Splits No This is a theoretical paper that does not report on empirical experiments with specific dataset splits (training, validation, test).
Hardware Specification No This is a theoretical paper and does not describe any specific hardware used for experiments.
Software Dependencies No This is a theoretical paper and does not list any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No This is a theoretical paper and does not describe any experimental setup details such as hyperparameters or training configurations.