Strategic Attentive Writer for Learning Macro-Actions
Authors: Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, koray kavukcuoglu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that STRAW delivers strong improvements on several ATARI games by employing temporally extended planning strategies (e.g. Ms. Pacman and Frostbite). It is at the same time a general algorithm that can be applied on any sequence data. To that end, we also show that when trained on text prediction task, STRAW naturally predicts frequent n-grams (instead of macro-actions), demonstrating the generality of the approach. We evaluate STRAW on a subset of Atari games that require longer term planning and show that it leads to substantial improvements in scores. We also demonstrate the generality of the STRAW architecture by training it on a text prediction task and show that it learns to use frequent n-grams as the macro-actions on this task. Section 5 presents the experimental evaluation of STRAW on 8 ATARI games, 2D maze navigation and next character prediction tasks. |
| Researcher Affiliation | Industry | Alexander (Sasha) Vezhnevets, Volodymyr Mnih, John Agapiou, Simon Osindero, Alex Graves, Oriol Vinyals, Koray Kavukcuoglu Google Deep Mind {vezhnick,vmnih,jagapiou,osindero,gravesa,vinyals,korayk}@google.com |
| Pseudocode | Yes | Algorithm 1 Action-plan update |
| Open Source Code | No | The paper does not provide an unambiguous sentence stating that the authors are releasing the code for the work described, nor does it include a direct link to a source-code repository. |
| Open Datasets | Yes | To demonstrate that it is capable of learning output patterns with complex structure we present a qualitative experiment on next character prediction using Penn Treebank dataset [16]. We evaluate STRAW on a subset of Atari games that require longer term planning and show that it leads to substantial improvements in scores. We also demonstrate the generality of the STRAW architecture by training it on a text prediction task and show that it learns to use frequent n-grams as the macro-actions on this task. Section 5 presents the experimental evaluation of STRAW on 8 ATARI games, 2D maze navigation and next character prediction tasks. |
| Dataset Splits | No | The paper mentions training on various tasks and environments (Atari games, 2D mazes, text prediction) but does not provide specific details on train/validation/test dataset splits in terms of percentages, sample counts, or references to predefined splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Python, LSTM, CNN, A3C method, and RMSProp, but it does not specify any version numbers for these software components, which are necessary for reproducibility. |
| Experiment Setup | Yes | The read and write patches are A 10 dimensional, and h is a 2 layer perceptron with 64 hidden units. The time horizon T = 500. For STRAWe (sec. 3.1) the Gaussian distribution for structured exploration is 128-dimensional. Learning rate and entropy penalty were sampled from a Log Uniform(10 4, 10 3) interval. Learning rate is linearly annealed from a sampled value to 0. To explore STRAW behaviour, we sample coding cost α Log Uniform(10 7, 10 4) and replanning penalty λ Log Uniform(10 6, 10 2). |