Statistical-Query Lower Bounds via Functional Gradients
Authors: Surbhi Goel, Aravind Gollakota, Adam Klivans
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give the first statistical-query lower bounds for agnostically learning any nonpolynomial activation with respect to Gaussian marginals (e.g., Re LU, sigmoid, sign). For the specific problem of Re LU regression (equivalently, agnostically learning a Re LU), we show that any statistical-query algorithm with tolerance n (1/ϵ)b must use at least 2ncϵ queries for some constants b, c > 0, where n is the dimension and ϵ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for amplifying recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts. |
| Researcher Affiliation | Collaboration | Surbhi Goel Microsoft Research New York, NY, USA goel.surbhi@microsoft.com Aravind Gollakota University of Texas at Austin Austin, TX, USA aravindg@cs.utexas.edu Adam Klivans University of Texas at Austin Austin, TX, USA klivans@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Frank Wolfe gradient descent over a generic inner product space...Algorithm 2 Frank Wolfe for solving minf F Lsur(f) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing open-source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical in nature, focusing on lower bounds and functional gradients. It does not describe empirical experiments with datasets, and therefore no public dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments involving dataset splits (training, validation, or test). |
| Hardware Specification | No | The paper does not provide any details about the hardware specifications used for running experiments, as it is a theoretical paper. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and algorithmic analysis. It does not describe an empirical experimental setup with hyperparameters or system-level training settings. |