Memory-Efficient Backpropagation Through Time
Authors: Audrunas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We used an LSTM mapping 256 inputs to 256 with a batch size of 64 and measured execution time for a single gradient descent step (forward and backward operation combined) as a function of sequence length (Figure 2(b)). |
| Researcher Affiliation | Industry | Audr unas Gruslys Google Deep Mind audrunas@google.com Rémi Munos Google Deep Mind munos@google.com Ivo Danihelka Google Deep Mind danihelka@google.com Marc Lanctot Google Deep Mind lanctot@google.com Alex Graves Google Deep Mind gravesa@google.com |
| Pseudocode | Yes | Pseudocode is given in the supplementary material. |
| Open Source Code | No | The paper states 'Pseudocode is given in the supplementary material' but does not provide concrete access (link, explicit statement of public release) to the source code for the methodology. |
| Open Datasets | No | The paper mentions using 'an LSTM mapping 256 inputs to 256' but does not specify a publicly available or open dataset name, link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, or detailed methodology) needed to reproduce data partitioning. |
| Hardware Specification | No | The paper mentions 'Graphics Processing Units (GPUs)' in general but does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We used an LSTM mapping 256 inputs to 256 with a batch size of 64 and measured execution time for a single gradient descent step (forward and backward operation combined) as a function of sequence length (Figure 2(b)). |