Strategic Attentive Writer for Learning Macro-Actions

Authors: Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, koray kavukcuoglu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate that STRAW delivers strong improvements on several ATARI games by employing temporally extended planning strategies (e.g. Ms. Pacman and Frostbite). It is at the same time a general algorithm that can be applied on any sequence data. To that end, we also show that when trained on text prediction task, STRAW naturally predicts frequent n-grams (instead of macro-actions), demonstrating the generality of the approach. We evaluate STRAW on a subset of Atari games that require longer term planning and show that it leads to substantial improvements in scores. We also demonstrate the generality of the STRAW architecture by training it on a text prediction task and show that it learns to use frequent n-grams as the macro-actions on this task. Section 5 presents the experimental evaluation of STRAW on 8 ATARI games, 2D maze navigation and next character prediction tasks.
Researcher Affiliation Industry Alexander (Sasha) Vezhnevets, Volodymyr Mnih, John Agapiou, Simon Osindero, Alex Graves, Oriol Vinyals, Koray Kavukcuoglu Google Deep Mind {vezhnick,vmnih,jagapiou,osindero,gravesa,vinyals,korayk}@google.com
Pseudocode Yes Algorithm 1 Action-plan update
Open Source Code No The paper does not provide an unambiguous sentence stating that the authors are releasing the code for the work described, nor does it include a direct link to a source-code repository.
Open Datasets Yes To demonstrate that it is capable of learning output patterns with complex structure we present a qualitative experiment on next character prediction using Penn Treebank dataset [16]. We evaluate STRAW on a subset of Atari games that require longer term planning and show that it leads to substantial improvements in scores. We also demonstrate the generality of the STRAW architecture by training it on a text prediction task and show that it learns to use frequent n-grams as the macro-actions on this task. Section 5 presents the experimental evaluation of STRAW on 8 ATARI games, 2D maze navigation and next character prediction tasks.
Dataset Splits No The paper mentions training on various tasks and environments (Atari games, 2D mazes, text prediction) but does not provide specific details on train/validation/test dataset splits in terms of percentages, sample counts, or references to predefined splits for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using Python, LSTM, CNN, A3C method, and RMSProp, but it does not specify any version numbers for these software components, which are necessary for reproducibility.
Experiment Setup Yes The read and write patches are A 10 dimensional, and h is a 2 layer perceptron with 64 hidden units. The time horizon T = 500. For STRAWe (sec. 3.1) the Gaussian distribution for structured exploration is 128-dimensional. Learning rate and entropy penalty were sampled from a Log Uniform(10 4, 10 3) interval. Learning rate is linearly annealed from a sampled value to 0. To explore STRAW behaviour, we sample coding cost α Log Uniform(10 7, 10 4) and replanning penalty λ Log Uniform(10 6, 10 2).