Statistical-Query Lower Bounds via Functional Gradients

Authors: Surbhi Goel, Aravind Gollakota, Adam Klivans

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We give the first statistical-query lower bounds for agnostically learning any nonpolynomial activation with respect to Gaussian marginals (e.g., Re LU, sigmoid, sign). For the specific problem of Re LU regression (equivalently, agnostically learning a Re LU), we show that any statistical-query algorithm with tolerance n (1/ϵ)b must use at least 2ncϵ queries for some constants b, c > 0, where n is the dimension and ϵ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for amplifying recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.
Researcher Affiliation Collaboration Surbhi Goel Microsoft Research New York, NY, USA goel.surbhi@microsoft.com Aravind Gollakota University of Texas at Austin Austin, TX, USA aravindg@cs.utexas.edu Adam Klivans University of Texas at Austin Austin, TX, USA klivans@cs.utexas.edu
Pseudocode Yes Algorithm 1 Frank Wolfe gradient descent over a generic inner product space...Algorithm 2 Frank Wolfe for solving minf F Lsur(f)
Open Source Code No The paper does not provide any explicit statements about releasing open-source code or links to a code repository for the methodology described.
Open Datasets No The paper is theoretical in nature, focusing on lower bounds and functional gradients. It does not describe empirical experiments with datasets, and therefore no public dataset access information is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments involving dataset splits (training, validation, or test).
Hardware Specification No The paper does not provide any details about the hardware specifications used for running experiments, as it is a theoretical paper.
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and algorithmic analysis. It does not describe an empirical experimental setup with hyperparameters or system-level training settings.