reproducibilityindex.ai

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Authors: Surbhi Goel, Sushrut Karmalkar, Adam Klivans

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We consider the problem of computing the best-ﬁtting Re LU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let opt < 1 be the population loss of the best-ﬁtting Re LU. We prove:Finding a Re LU with square-loss opt+ϵ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the ﬁrst hardness result for learning a Re LU with respect to Gaussian marginals, and our results imply unconditionally that gradient descent cannot converge to the global minimum in polynomial time. There exists an efﬁcient approximation algorithm for ﬁnding the best-ﬁtting Re LU that achieves error O(opt2/3). The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss.
Researcher Affiliation	Academia	Surbhi Goel Department of Computer Science University of Texas at Austin surbhi@cs.utexas.eduSushrut Karmalkar Department of Computer Science University of Texas at Austin sushrutk@cs.utexas.eduAdam R. Klivans Department of Computer Science University of Texas at Austin klivans@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 Learning Sparse Parities with Noise using Agnostic Re LU learner; Algorithm 2
Open Source Code	No	The paper does not include any explicit statements about providing open-source code for the methodology, nor does it provide links to code repositories.
Open Datasets	No	The paper assumes data drawn 'according to a spherical Gaussian distribution' for its theoretical analysis, but it does not specify or provide concrete access information (link, DOI, repository, or citation) for any publicly available or open dataset.
Dataset Splits	Yes	Input Training set S of M1 samples (xi, yi)M1 i=1, validation set V of M2 samples (xi, yi)M1+M2 i=M1+1, error parameter ϵ... Using standard concentration inequalities for sub-Gaussian and subexponential random variables [Ver] we see that using a validation set of M2 = 100/ϵ2 samples, we have for all j, \|c err Vj(hj) err Dj(hj)\| ϵ/4.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any ancillary software components or libraries (e.g., Python, PyTorch, TensorFlow, specific solvers).
Experiment Setup	No	The paper mentions theoretical parameters like 'error parameter ϵ', 'M1 samples', and 'M2 samples', but these are not 'concrete hyperparameter values' or 'system-level training settings' as would be expected for an empirical experimental setup (e.g., learning rate, batch size, optimizer settings).