Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies

Authors: Sarath Chandar, Chinnadhurai Sankar, Eugene Vorontsov, Samira Ebrahimi Kahou, Yoshua Bengio3280-3287

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures.
Researcher Affiliation Collaboration 1Mila, Universit e de Montr eal 2Google Brain 3Microsoft Research
Pseudocode No The paper describes the model mathematically using equations (9-17) but does not include structured pseudocode or an algorithm block.
Open Source Code Yes The code for NRU Cell is available at https://github. com/apsarath/NRU.
Open Datasets Yes character level language modelling with the Penn Treebank Corpus (PTB) Marcus, Santorini, and Marcinkiewicz (1993).
Dataset Splits Yes All models were trained for 20 epochs and evaluated on the test set after selecting for each the model state which yields the lowest BPC on the validation set.
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are provided.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes We used the Adam optimizer Kingma and Ba (2014) with a default learning rate of 0.001 in all our experiments. We clipped the gradients by norm value of 1 for all models except GORU and EURNN since their transition operators do not expand norm. We used a batch size of 10 for most tasks, unless otherwise stated.