Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Authors: Zeyuan Allen-Zhu, Yuanzhi Li
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Existing generalization bounds for RNN scale exponentially with the input length, significantly limiting their practical implications. In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class efficiently, meaning that both time and sample complexity scale polynomially in the input length (or almost polynomially, depending on the concept). This concept class at least includes functions where each output token is generated from inputs of earlier tokens using a smooth two-layer neural network. Our proof of Theorem 1 divides into four conceptual steps. |
| Researcher Affiliation | Collaboration | Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu |
| Pseudocode | No | The paper describes the SGD update rule with an equation but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'Full version and future updates can be found on https://arxiv.org/abs/1902.01028.' but does not explicitly provide a link to or statement about the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and defines a concept of data generation from an unknown distribution D and a training dataset Z, but it does not specify or provide access information for any publicly available or open dataset. |
| Dataset Splits | No | This is a theoretical paper and does not conduct experiments, therefore it does not provide specific dataset split information for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe any experiments, therefore no specific hardware details are provided. |
| Software Dependencies | No | This is a theoretical paper and does not describe any experiments or specific ancillary software details with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe any experiments, therefore no specific experimental setup details or hyperparameters are provided. |