Learning to Transduce with Unbounded Memory
Authors: Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil Blunsom
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we explore the representational power of these models using synthetic grammars designed to exhibit phenomena similar to those found in real transduction problems such as machine translation. These experiments lead us to propose new memory-based recurrent networks that implement continuously differentiable analogues of traditional data structures such as Stacks, Queues, and De Ques. We show that these architectures exhibit superior generalisation performance to Deep RNNs and are often able to learn the underlying generating algorithms in our transduction experiments. |
| Researcher Affiliation | Collaboration | Edward Grefenstette Google Deep Mind etg@google.com Karl Moritz Hermann Google Deep Mind kmh@google.com Mustafa Suleyman Google Deep Mind mustafasul@google.com Phil Blunsom Google Deep Mind and Oxford University pblunsom@google.com |
| Pseudocode | No | The paper describes algorithms using mathematical equations and textual explanations (e.g., in sections 3.1, 3.2, 3.3), but it does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor are the descriptions formatted as code-like procedures. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code of the methodology described. |
| Open Datasets | No | The paper states that training data is randomly generated based on specified parameters (e.g., 'The length of each training source sequence is uniformly sampled from unif {8, 64}'), rather than using a pre-existing publicly available or open dataset for which concrete access information is provided. |
| Dataset Splits | No | The paper defines training and testing data ranges based on sequence length ('training data by rejecting samples that are outside of the range [8, 64], and testing data by rejecting samples outside of the range [65, 128]'), but it does not explicitly specify a separate validation dataset split with percentages, counts, or a clear methodology for its creation or use. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'minibatch RMSProp' (an optimizer) but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | When running experiments, we trained and tested a version of each model where all LSTMs in each model have a hidden layer size of 256, and one for a hidden layer size of 512. The Stack/Queue/De Que embedding size was arbitrarily set to 256, half the maximum hidden size. Models are trained with minibatch RMSProp [18], with a batch size of 10. We grid-searched learning rates across the set {5 10 3, 1 10 3, 5 10 4, 1 10 4, 5 10 5}. We used gradient clipping [19], clipping all gradients above 1. |