Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Authors: Zeyuan Allen-Zhu, Yuanzhi Li

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Existing generalization bounds for RNN scale exponentially with the input length, significantly limiting their practical implications. In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class efficiently, meaning that both time and sample complexity scale polynomially in the input length (or almost polynomially, depending on the concept). This concept class at least includes functions where each output token is generated from inputs of earlier tokens using a smooth two-layer neural network. Our proof of Theorem 1 divides into four conceptual steps.
Researcher Affiliation Collaboration Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu
Pseudocode No The paper describes the SGD update rule with an equation but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'Full version and future updates can be found on https://arxiv.org/abs/1902.01028.' but does not explicitly provide a link to or statement about the release of open-source code for the described methodology.
Open Datasets No The paper is theoretical and defines a concept of data generation from an unknown distribution D and a training dataset Z, but it does not specify or provide access information for any publicly available or open dataset.
Dataset Splits No This is a theoretical paper and does not conduct experiments, therefore it does not provide specific dataset split information for training, validation, or testing.
Hardware Specification No This is a theoretical paper and does not describe any experiments, therefore no specific hardware details are provided.
Software Dependencies No This is a theoretical paper and does not describe any experiments or specific ancillary software details with version numbers.
Experiment Setup No This is a theoretical paper and does not describe any experiments, therefore no specific experimental setup details or hyperparameters are provided.