Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals
Authors: Surbhi Goel, Sushrut Karmalkar, Adam Klivans
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We consider the problem of computing the best-fitting Re LU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let opt < 1 be the population loss of the best-fitting Re LU. We prove:Finding a Re LU with square-loss opt+ϵ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a Re LU with respect to Gaussian marginals, and our results imply unconditionally that gradient descent cannot converge to the global minimum in polynomial time. There exists an efficient approximation algorithm for finding the best-fitting Re LU that achieves error O(opt2/3). The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss. |
| Researcher Affiliation | Academia | Surbhi Goel Department of Computer Science University of Texas at Austin surbhi@cs.utexas.eduSushrut Karmalkar Department of Computer Science University of Texas at Austin sushrutk@cs.utexas.eduAdam R. Klivans Department of Computer Science University of Texas at Austin klivans@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Learning Sparse Parities with Noise using Agnostic Re LU learner; Algorithm 2 |
| Open Source Code | No | The paper does not include any explicit statements about providing open-source code for the methodology, nor does it provide links to code repositories. |
| Open Datasets | No | The paper assumes data drawn 'according to a spherical Gaussian distribution' for its theoretical analysis, but it does not specify or provide concrete access information (link, DOI, repository, or citation) for any publicly available or open dataset. |
| Dataset Splits | Yes | Input Training set S of M1 samples (xi, yi)M1 i=1, validation set V of M2 samples (xi, yi)M1+M2 i=M1+1, error parameter ϵ... Using standard concentration inequalities for sub-Gaussian and subexponential random variables [Ver] we see that using a validation set of M2 = 100/ϵ2 samples, we have for all j, |c err Vj(hj) err Dj(hj)| ϵ/4. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any ancillary software components or libraries (e.g., Python, PyTorch, TensorFlow, specific solvers). |
| Experiment Setup | No | The paper mentions theoretical parameters like 'error parameter ϵ', 'M1 samples', and 'M2 samples', but these are not 'concrete hyperparameter values' or 'system-level training settings' as would be expected for an empirical experimental setup (e.g., learning rate, batch size, optimizer settings). |