reproducibilityindex.ai

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Authors: Zeyuan Allen-Zhu, Yuanzhi Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Existing generalization bounds for RNN scale exponentially with the input length, signiﬁcantly limiting their practical implications. In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class efﬁciently, meaning that both time and sample complexity scale polynomially in the input length (or almost polynomially, depending on the concept). This concept class at least includes functions where each output token is generated from inputs of earlier tokens using a smooth two-layer neural network. Our proof of Theorem 1 divides into four conceptual steps.
Researcher Affiliation	Collaboration	Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu
Pseudocode	No	The paper describes the SGD update rule with an equation but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'Full version and future updates can be found on https://arxiv.org/abs/1902.01028.' but does not explicitly provide a link to or statement about the release of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and defines a concept of data generation from an unknown distribution D and a training dataset Z, but it does not specify or provide access information for any publicly available or open dataset.
Dataset Splits	No	This is a theoretical paper and does not conduct experiments, therefore it does not provide specific dataset split information for training, validation, or testing.
Hardware Specification	No	This is a theoretical paper and does not describe any experiments, therefore no specific hardware details are provided.
Software Dependencies	No	This is a theoretical paper and does not describe any experiments or specific ancillary software details with version numbers.
Experiment Setup	No	This is a theoretical paper and does not describe any experiments, therefore no specific experimental setup details or hyperparameters are provided.