Strongly-Typed Recurrent Neural Networks
Authors: David Balduzzi, Muhammad Ghifary
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in section 4 show that, despite being more constrained, strongly-typed architectures achieve lower training and comparable generalization error to classical architectures. |
| Researcher Affiliation | Collaboration | 1Victoria University of Wellington, New Zealand 2Weta Digital, New Zealand |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | We used Leo Tolstoy s War and Peace (WP) which consists of 3,258,246 characters of English text, split into train/val/test sets with 80/10/10 ratios. We used the Penn Treebank (PTB) dataset (Marcus et al., 1993), which consists of 929K training words, 73K validation words, and 82K test words, with vocabulary size of 10K words. The PTB dataset is publicly available on web.3 http://www.fit.vutbr.cz/ imikolov/rnnlm/ simple-examples.tgz |
| Dataset Splits | Yes | We used Leo Tolstoy s War and Peace (WP) which consists of 3,258,246 characters of English text, split into train/val/test sets with 80/10/10 ratios. We used the Penn Treebank (PTB) dataset (Marcus et al., 1993), which consists of 929K training words, 73K validation words, and 82K test words |
| Hardware Specification | Yes | Training on the PTB dataset on an NVIDIA GTX 980 GPU |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific deep learning frameworks like TensorFlow or PyTorch, or programming language versions like Python 3.x). |
| Experiment Setup | Yes | Results are reported for two configurations: 64 and 256 , which are models with the same number of parameters as a 1-layer LSTM with 64 and 256 cells per layer respectively. Dropout regularization was only applied to the 256 models. The dropout rate was taken from {0.1, 0.2} based on validation performance. For the medium models, we selected the dropout rate from {0.4, 0.5, 0.6} according to validation performance. |