Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference. |
| Researcher Affiliation | Collaboration | Yikang Shen Mila/Universit e de Montr eal and Microsoft Research Montr eal, CanadaShawn Tan Mila/Universit e de Montr eal Montr eal, CanadaAlessandro Sordoni Microsoft Research Montr eal, CanadaAaron Courville Mila/Universit e de Montr eal Montr eal, Canada |
| Pseudocode | No | The paper provides mathematical equations for the model's operations but does not include a distinct section or figure labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | The code can be found at https://github.com/yikangshen/Ordered-Neurons. |
| Open Datasets | Yes | We evaluate our model by measuring perplexity on the Penn Tree Bank (PTB) (Marcus et al., 1993; Mikolov, 2012) task. We take our best model for the language modeling task, and test it on WSJ10 dataset and WSJ test set. ... we train both our ON-LSTM model and a baseline LSTM language model on a 90 million word subset of Wikipedia. |
| Dataset Splits | Yes | We manually searched some of the dropout values for ON-LSTM based on the validation performance. The train/test split is as described in the original codebase3, and 10% of training set is set aside as the validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using a codebase from a prior work (AWD-LSTM) for hyper-parameters and references the Marvin & Linzen (2018) codebase for syntactic evaluation, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | Our model uses a three-layer ONLSTM model with 1150 units in the hidden layer and an embedding of size 400. For master gates, the downsize factor C = 10. The values used for dropout on the word vectors, the output between LSTM layers, the output of the final LSTM layer, and embedding dropout where (0.5, 0.3, 0.45, 0.1) respectively. A weight-dropout of 0.45 was applied to the recurrent weight matrices. Both language models have two layers of 650 units, a batch size of 128, a dropout rate of 0.2, a learning rate of 20.0, and were trained for 40 epochs. The input embeddings have 200 dimensions and the output embeddings have 650 dimesions. |