reproducibilityindex.ai

Understanding Generalization in Recurrent Neural Networks

Authors: Zhuozhuo Tu, Fengxiang He, Dacheng Tao

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments. We now study the effect of random noise on generalization of RNNs empirically. For simplicity, we consider the IMDB dataset, a collection of 50K movie reviews for binary sentiment classiﬁcation. We use Glo Ve word embedding to map each word to a 50dimensional vector. We train vanilla RNNs with Re LU activation function for sequence length L = 100. The corresponding smallest eigenvalue of E(xx T ) is approximated by using the total training data, which is 4 10 4. We add Gaussian noise to the input data in the training process with σϵ = 0.1, 0.2, 0.3 and 0.4. The generalization error which is the gap between test error without noise and training error with noise for L = 100 and different values of σϵ is shown in Figure 1 (results for other values of L in Appendix D). We observe that as we start injecting noise, the generalization error becomes better. But when the deviation of noise keeps growing, the generalization error shows an increasing tendency. This behavior is consistent with the prediction made by our bound.
Researcher Affiliation	Academia	Zhuozhuo Tu, Fengxiang He, Dacheng Tao UBTECH Sydney AI Centre, School of Computer Science, Faculty of Engineering The University of Sydney Darlington, NSW 2008, Australia zhtu3055@uni.sydney.edu.au, {fengxiang.he,dacheng.tao}@sydney.edu.au
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	No	For simplicity, we consider the IMDB dataset, a collection of 50K movie reviews for binary sentiment classiﬁcation. The paper mentions the dataset by name but does not provide a specific link, DOI, or a formal citation for accessing it.
Dataset Splits	No	The paper mentions using the IMDB dataset for training but does not specify exact training, validation, or test dataset splits or a methodology for creating them.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using 'Glo Ve word embedding' but does not specify any software libraries or dependencies with version numbers used for the experiments.
Experiment Setup	No	We train vanilla RNNs with Re LU activation function for sequence length L = 100. The corresponding smallest eigenvalue of E(xx T ) is approximated by using the total training data, which is 4 10 4. We add Gaussian noise to the input data in the training process with σϵ = 0.1, 0.2, 0.3 and 0.4. These details are insufficient for a complete experimental setup description (e.g., no learning rate, optimizer, epochs, batch size, etc.).