Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies
Authors: Sarath Chandar, Chinnadhurai Sankar, Eugene Vorontsov, Samira Ebrahimi Kahou, Yoshua Bengio3280-3287
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures. |
| Researcher Affiliation | Collaboration | 1Mila, Universit e de Montr eal 2Google Brain 3Microsoft Research |
| Pseudocode | No | The paper describes the model mathematically using equations (9-17) but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The code for NRU Cell is available at https://github. com/apsarath/NRU. |
| Open Datasets | Yes | character level language modelling with the Penn Treebank Corpus (PTB) Marcus, Santorini, and Marcinkiewicz (1993). |
| Dataset Splits | Yes | All models were trained for 20 epochs and evaluated on the test set after selecting for each the model state which yields the lowest BPC on the validation set. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are provided. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We used the Adam optimizer Kingma and Ba (2014) with a default learning rate of 0.001 in all our experiments. We clipped the gradients by norm value of 1 for all models except GORU and EURNN since their transition operators do not expand norm. We used a batch size of 10 for most tasks, unless otherwise stated. |