reproducibilityindex.ai

On the Convergence Rate of Training Recurrent Neural Networks

Authors: Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show when the number of neurons is sufﬁciently large, meaning polynomial in the training data size and in L, then SGD is capable of minimizing the regression loss in the linear convergence rate. This gives theoretical evidence of how RNNs can memorize data. More importantly, in this paper we build general toolkits to analyze multi-layer networks with Re LU activations. For instance, we prove why Re LU activations can prevent exponential gradient explosion or vanishing, and build a perturbation theory to analyze ﬁrst-order approximation of multi-layer networks.
Researcher Affiliation	Collaboration	Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Yuanzhi Li Carnegie Mellon University yuanzhil@andrew.cmu.edu Zhao Song UT-Austin zhaos@utexas.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "Full version and future updates can be found on https://arxiv.org/abs/1810.12065." This link refers to the arXiv paper itself, not to any open-source code for the described methodology. No other statements about code availability are present.
Open Datasets	No	The paper is theoretical and does not report on empirical experiments using a specific public dataset. It refers to "training inputs" and "training sequences" as part of the theoretical model setup, but provides no concrete access information (link, DOI, citation) for a publicly available dataset.
Dataset Splits	No	The paper is theoretical and does not conduct experiments involving dataset splits. While it mentions "training data", there is no discussion of training/validation/test splits for empirical evaluation.
Hardware Specification	No	The paper is theoretical and does not describe any hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not provide specific software dependency details with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe a concrete experimental setup, hyperparameters, or training configurations for empirical reproduction. It defines theoretical parameters for the mathematical analysis but these are not for an empirical setup.