Understanding Generalization in Recurrent Neural Networks
Authors: Zhuozhuo Tu, Fengxiang He, Dacheng Tao
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. We now study the effect of random noise on generalization of RNNs empirically. For simplicity, we consider the IMDB dataset, a collection of 50K movie reviews for binary sentiment classification. We use Glo Ve word embedding to map each word to a 50dimensional vector. We train vanilla RNNs with Re LU activation function for sequence length L = 100. The corresponding smallest eigenvalue of E(xx T ) is approximated by using the total training data, which is 4 10 4. We add Gaussian noise to the input data in the training process with σϵ = 0.1, 0.2, 0.3 and 0.4. The generalization error which is the gap between test error without noise and training error with noise for L = 100 and different values of σϵ is shown in Figure 1 (results for other values of L in Appendix D). We observe that as we start injecting noise, the generalization error becomes better. But when the deviation of noise keeps growing, the generalization error shows an increasing tendency. This behavior is consistent with the prediction made by our bound. |
| Researcher Affiliation | Academia | Zhuozhuo Tu, Fengxiang He, Dacheng Tao UBTECH Sydney AI Centre, School of Computer Science, Faculty of Engineering The University of Sydney Darlington, NSW 2008, Australia zhtu3055@uni.sydney.edu.au, {fengxiang.he,dacheng.tao}@sydney.edu.au |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | For simplicity, we consider the IMDB dataset, a collection of 50K movie reviews for binary sentiment classification. The paper mentions the dataset by name but does not provide a specific link, DOI, or a formal citation for accessing it. |
| Dataset Splits | No | The paper mentions using the IMDB dataset for training but does not specify exact training, validation, or test dataset splits or a methodology for creating them. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'Glo Ve word embedding' but does not specify any software libraries or dependencies with version numbers used for the experiments. |
| Experiment Setup | No | We train vanilla RNNs with Re LU activation function for sequence length L = 100. The corresponding smallest eigenvalue of E(xx T ) is approximated by using the total training data, which is 4 10 4. We add Gaussian noise to the input data in the training process with σϵ = 0.1, 0.2, 0.3 and 0.4. These details are insufficient for a complete experimental setup description (e.g., no learning rate, optimizer, epochs, batch size, etc.). |