A recurrent neural network without chaos

Authors: Thomas Laurent, James von Brecht

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The final section provides a series of experiments that demonstrate that CFN achieve results comparable to LSTM on the word-level language modeling task.
Researcher Affiliation Academia Thomas Laurent Department of Mathematics Loyola Marymount University Los Angeles, CA 90045, USA tlaurent@lmu.edu; James von Brecht Department of Mathematics California State University, Long Beach Long Beach, CA 90840, USA james.vonbrecht@csulb.edu
Pseudocode No The paper describes the model architecture and its implementation using mathematical equations and textual descriptions, but does not include a formal pseudocode or algorithm block.
Open Source Code No The paper does not mention releasing open-source code or provide a link to a code repository.
Open Datasets Yes We use two datasets for these experiments, namely the Penn Treebank corpus (Marcus et al., 1993) and the Text8 corpus (Mikolov et al., 2014).
Dataset Splits Yes The Penn Treebank Corpus has 1 million words and a vocabulary size of 10,000. We used the code from Zaremba et al. (2014) to construct and split the dataset into a training set (929K words), a validation set (73K words) and a test set (82K words). The Text8 corpus has 100 million characters and a vocabulary size of 44,000. We used the script from Mikolov et al. (2014) to construct and split the dataset into a training set (first 99M characters) and a development set (last 1M characters).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments.
Software Dependencies No The paper mentions using Python code for dataset construction and a simple online steepest descent algorithm, but it does not specify any software dependencies with version numbers.
Experiment Setup Yes In all experiments, the CFN and LSTM networks are unrolled for T = 35 steps and we take minibatches of size 20. ... We initialize all the weights in the CFN, except for the bias of the gates, uniformly at random in [ 0.07, 0.07]. We initialize the bias bθ and bη of the gates to 1 and 1, respectively... The dropout rate p and q are chosen as follows: For the experiments with 20M parameters, we set p = 55% and q = 45% for the CFN and p = 60% and q = 40% for the LSTM; For the experiments with 50M parameters, we set p = 65% and q = 55% for the CFN and p = 70% and q = 50% for the LSTM. ... The initial learning rate were chosen to be lr0 = 7 for the CFN and lr0 = 5 for the LSTM.