Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Authors: Armand Joulin, Tomas Mikolov
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we consider various sequences generated by simple algorithms, where the goal is to learn their generation rule [3, 12, 29]. We hope to understand the scope of algorithmic patterns each model can capture. We also evaluate the models on a standard language modeling dataset, Penn Treebank. Implementation details. Stack and List RNNs are trained with SGD and backpropagation through time with 50 steps [32], a hard clipping of 15 to prevent gradient explosions [23], and an initial learning rate of 0.1. The learning rate is divided by 2 each time the entropy on the validation set is not decreasing. The depth k defined in Eq. (6) is set to 2. The free parameters are the number of hidden units, stacks and the use of NO-OP. The baselines are RNNs with 40, 100 and 500 units, and LSTMs with 1 and 2 layers with 50, 100 and 200 units. The hyper-parameters of the baselines are selected on the validation sets. and Table 2: Comparison with RNN and LSTM on sequences generated by counting algorithms. The sequences seen during training are such that n < 20 (and n + m < 20), and we test on sequences up to n = 60. We report the percent of n for which the model was able to correctly predict the sequences. Performance above 33.3% means it is able to generalize to never seen sequence lengths. |
| Researcher Affiliation | Industry | Armand Joulin Facebook AI Research 770 Broadway, New York, USA. ajoulin@fb.com Tomas Mikolov Facebook AI Research 770 Broadway, New York, USA. tmikolov@fb.com |
| Pseudocode | No | The paper uses mathematical equations to describe the models but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/facebook/Stack-RNN |
| Open Datasets | Yes | We also evaluate the models on a standard language modeling dataset, Penn Treebank. |
| Dataset Splits | Yes | Our experimental setting is the following: the training and validation set are composed of sequences generated with n up to N < 20 while the test set is composed of sequences generated with n up to 60. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions algorithms and models used (e.g., SGD, LSTM, RNN) but does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Stack and List RNNs are trained with SGD and backpropagation through time with 50 steps [32], a hard clipping of 15 to prevent gradient explosions [23], and an initial learning rate of 0.1. The learning rate is divided by 2 each time the entropy on the validation set is not decreasing. The depth k defined in Eq. (6) is set to 2. The free parameters are the number of hidden units, stacks and the use of NO-OP. The baselines are RNNs with 40, 100 and 500 units, and LSTMs with 1 and 2 layers with 50, 100 and 200 units. The hyper-parameters of the baselines are selected on the validation sets. |