A recurrent neural network without chaos
Authors: Thomas Laurent, James von Brecht
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The final section provides a series of experiments that demonstrate that CFN achieve results comparable to LSTM on the word-level language modeling task. |
| Researcher Affiliation | Academia | Thomas Laurent Department of Mathematics Loyola Marymount University Los Angeles, CA 90045, USA tlaurent@lmu.edu; James von Brecht Department of Mathematics California State University, Long Beach Long Beach, CA 90840, USA james.vonbrecht@csulb.edu |
| Pseudocode | No | The paper describes the model architecture and its implementation using mathematical equations and textual descriptions, but does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not mention releasing open-source code or provide a link to a code repository. |
| Open Datasets | Yes | We use two datasets for these experiments, namely the Penn Treebank corpus (Marcus et al., 1993) and the Text8 corpus (Mikolov et al., 2014). |
| Dataset Splits | Yes | The Penn Treebank Corpus has 1 million words and a vocabulary size of 10,000. We used the code from Zaremba et al. (2014) to construct and split the dataset into a training set (929K words), a validation set (73K words) and a test set (82K words). The Text8 corpus has 100 million characters and a vocabulary size of 44,000. We used the script from Mikolov et al. (2014) to construct and split the dataset into a training set (first 99M characters) and a development set (last 1M characters). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions using Python code for dataset construction and a simple online steepest descent algorithm, but it does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | In all experiments, the CFN and LSTM networks are unrolled for T = 35 steps and we take minibatches of size 20. ... We initialize all the weights in the CFN, except for the bias of the gates, uniformly at random in [ 0.07, 0.07]. We initialize the bias bθ and bη of the gates to 1 and 1, respectively... The dropout rate p and q are chosen as follows: For the experiments with 20M parameters, we set p = 55% and q = 45% for the CFN and p = 60% and q = 40% for the LSTM; For the experiments with 50M parameters, we set p = 65% and q = 55% for the CFN and p = 70% and q = 50% for the LSTM. ... The initial learning rate were chosen to be lr0 = 7 for the CFN and lr0 = 5 for the LSTM. |