Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Authors: Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
Researcher Affiliation Collaboration Yikang Shen Mila/Universit e de Montr eal and Microsoft Research Montr eal, CanadaShawn Tan Mila/Universit e de Montr eal Montr eal, CanadaAlessandro Sordoni Microsoft Research Montr eal, CanadaAaron Courville Mila/Universit e de Montr eal Montr eal, Canada
Pseudocode No The paper provides mathematical equations for the model's operations but does not include a distinct section or figure labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes The code can be found at https://github.com/yikangshen/Ordered-Neurons.
Open Datasets Yes We evaluate our model by measuring perplexity on the Penn Tree Bank (PTB) (Marcus et al., 1993; Mikolov, 2012) task. We take our best model for the language modeling task, and test it on WSJ10 dataset and WSJ test set. ... we train both our ON-LSTM model and a baseline LSTM language model on a 90 million word subset of Wikipedia.
Dataset Splits Yes We manually searched some of the dropout values for ON-LSTM based on the validation performance. The train/test split is as described in the original codebase3, and 10% of training set is set aside as the validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using a codebase from a prior work (AWD-LSTM) for hyper-parameters and references the Marvin & Linzen (2018) codebase for syntactic evaluation, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes Our model uses a three-layer ONLSTM model with 1150 units in the hidden layer and an embedding of size 400. For master gates, the downsize factor C = 10. The values used for dropout on the word vectors, the output between LSTM layers, the output of the final LSTM layer, and embedding dropout where (0.5, 0.3, 0.45, 0.1) respectively. A weight-dropout of 0.45 was applied to the recurrent weight matrices. Both language models have two layers of 650 units, a batch size of 128, a dropout rate of 0.2, a learning rate of 20.0, and were trained for 40 epochs. The input embeddings have 200 dimensions and the output embeddings have 650 dimesions.