reproducibilityindex.ai

Agnostic Learning of General ReLU Activation Using Gradient Descent

Authors: Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We provide a convergence analysis of gradient descent for the problem of agnostically learning a single Re LU function under Gaussian distributions. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a Re LU function that achieves an error that is within a constant factor of the optimal i.e., it is guaranteed to achieve an error of O(OPT) and OVERVIEW OF THE ANALYSIS (PROOF OF THEOREM 1.1).
Researcher Affiliation	Collaboration	Pranjal Awasthi Google Research pranjalawasthi@google.com, Northwestern University alextang@u.northwestern.edu, Aravindan Vijayaraghavan Northwestern University aravindv@northwestern.edu.
Pseudocode	Yes	Multiscale random initialization We are given a parameter M such that kvk2 2 [2 log M, 2log M] (note that M can have large dependencies on d and other parameters; our guarantees will be polynomial in log M). A random initializer w = ( w, 0) is drawn from Dunknown(M) as follows: 1. Pick j uniformly at random from dlog Me, dlog Me + 1, . . . , 1, 0, 1, . . . , dlog Me 2. 2 R+ is drawn according to D as follows: we ﬁrst pick1 g N(0, 1) and set = 2j\|g\|. 3. A uniformly random unit vector ˆw 2 Rd is drawn and we output w = ˆw.
Open Source Code	No	The paper does not contain any statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	No	The paper specifies theoretical assumptions about data distribution, such as 'Let D be a distribution over (ex, y) 2 Rd R where the marginal over ex is the standard Gaussian N(0, I)'. However, it does not refer to or provide access information for any concrete, publicly available dataset used for training or empirical evaluation.
Dataset Splits	No	As the paper is theoretical and does not conduct experiments on a specific dataset, it does not provide details regarding training, validation, or test dataset splits.
Hardware Specification	No	This is a theoretical paper focused on convergence analysis; no hardware specifications for running experiments are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any implementation that would require specific software dependencies with version numbers.
Experiment Setup	No	As a theoretical paper, it does not describe an experimental setup with specific hyperparameters or system-level training settings. It only mentions 'standard gradient descent algorithm with a ﬁxed learning rate'.