Variable Computation in Recurrent Neural Networks
Authors: Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that not only do our models require fewer operations, they also lead to better performance overall on evaluation tasks. |
| Researcher Affiliation | Collaboration | Yacine Jernite Department of Computer Science New York University New York, NY 10012, USA jernite@cs.nyu.edu Edouard Grave, Armand Joulin & Tomas Mikolov Facebook AI Research New York, NY 10003, USA {egrave,ajoulin,tmikolov}@fb.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the models using mathematical equations and prose. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | We downloaded a corpus of Irish traditional tunes from https://thesession.org and split them into a training validation and test of 16,000 (2.4M tokens), 1,511 (227,000 tokens) and 2,000 (288,000 tokens) melodies respectively. |
| Dataset Splits | Yes | We downloaded a corpus of Irish traditional tunes from https://thesession.org and split them into a training validation and test of 16,000 (2.4M tokens), 1,511 (227,000 tokens) and 2,000 (288,000 tokens) melodies respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | All experiments were run using a symmetrical ℓ1 penalty on the scheduler m, that is, penalizing mt when it is greater or smaller than target m, with m taking various values in the range [0.2, 0.5]. In all experiments, we start with a sharpness parameter λ = 0.1, and increase it by 0.1 per epoch to a maximum value of 1. |