Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

Authors: Armand Joulin, Tomas Mikolov

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we consider various sequences generated by simple algorithms, where the goal is to learn their generation rule [3, 12, 29]. We hope to understand the scope of algorithmic patterns each model can capture. We also evaluate the models on a standard language modeling dataset, Penn Treebank. Implementation details. Stack and List RNNs are trained with SGD and backpropagation through time with 50 steps [32], a hard clipping of 15 to prevent gradient explosions [23], and an initial learning rate of 0.1. The learning rate is divided by 2 each time the entropy on the validation set is not decreasing. The depth k defined in Eq. (6) is set to 2. The free parameters are the number of hidden units, stacks and the use of NO-OP. The baselines are RNNs with 40, 100 and 500 units, and LSTMs with 1 and 2 layers with 50, 100 and 200 units. The hyper-parameters of the baselines are selected on the validation sets. and Table 2: Comparison with RNN and LSTM on sequences generated by counting algorithms. The sequences seen during training are such that n < 20 (and n + m < 20), and we test on sequences up to n = 60. We report the percent of n for which the model was able to correctly predict the sequences. Performance above 33.3% means it is able to generalize to never seen sequence lengths.
Researcher Affiliation Industry Armand Joulin Facebook AI Research 770 Broadway, New York, USA. ajoulin@fb.com Tomas Mikolov Facebook AI Research 770 Broadway, New York, USA. tmikolov@fb.com
Pseudocode No The paper uses mathematical equations to describe the models but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/facebook/Stack-RNN
Open Datasets Yes We also evaluate the models on a standard language modeling dataset, Penn Treebank.
Dataset Splits Yes Our experimental setting is the following: the training and validation set are composed of sequences generated with n up to N < 20 while the test set is composed of sequences generated with n up to 60.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies No The paper mentions algorithms and models used (e.g., SGD, LSTM, RNN) but does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes Stack and List RNNs are trained with SGD and backpropagation through time with 50 steps [32], a hard clipping of 15 to prevent gradient explosions [23], and an initial learning rate of 0.1. The learning rate is divided by 2 each time the entropy on the validation set is not decreasing. The depth k defined in Eq. (6) is set to 2. The free parameters are the number of hidden units, stacks and the use of NO-OP. The baselines are RNNs with 40, 100 and 500 units, and LSTMs with 1 and 2 layers with 50, 100 and 200 units. The hyper-parameters of the baselines are selected on the validation sets.