FeUdal Networks for Hierarchical Reinforcement Learning

Authors: Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on a selection of ATARI games (including the infamous Montezuma s revenge) and on several memory tasks in the 3D Deep Mind Lab environment (Beattie et al., 2016) show that Fu N significantly improves long-term credit assignment and memorisation. 5. Experiments The goal of our experiments is to demonstrate that Fu N learns non-trivial, helpful, and interpretable sub-policies and sub-goals, and also to validate components of the architecture.
Researcher Affiliation Industry 1Deep Mind, London, United Kingdom. Correspondence to: Alexander Sasha Vezhnevets <vezhnick@google.com>.
Pseudocode No The paper describes algorithms and architectures using text and equations, but it does not contain structured pseudocode or algorithm blocks with explicit labels like 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide concrete access to source code (no specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes Our experiments on a selection of ATARI games (including the infamous Montezuma s revenge) and on several memory tasks in the 3D Deep Mind Lab environment (Beattie et al., 2016). Montezuma s revenge is one of the hardest games available through the ALE (Bellemare et al., 2012).
Dataset Splits No The paper describes training and evaluation on game environments and episodes, but it does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification No The paper states 'We trained our models using...' but does not provide specific hardware details (e.g., CPU/GPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software components and methods like 'A3C method (Mnih et al., 2016)' and 'shared RMSProp', but it does not provide specific version numbers for these or any other software dependencies (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes Optimisation. We use the A3C method (Mnih et al., 2016) for all reinforcement learning experiments. We cut the trajectory and run backpropagation through time (BPTT) ... For Fu N K = 400, for LSTM, unless otherwise stated, K = 40. The optimization process runs 32 asynchronous threads using shared RMSProp. Learning rate and entropy penalty were sampled from a Log Uniform(10−4, 10−3) interval for LSTM. For Fu N the learning rate was sampled from Log Uniform(10−4.5, 10−3.5). We sample its weight α Uniform(0, 1). For all ATARI experiments we clip the reward to [−1, +1] interval. We use a small discount 0.99 for LSTM; for Fu N we use 0.99 in Worker and 0.999 in Manager. The perceptual module fpercept is a convolutional network (CNN) ... The CNN has a first layer with 16 8x8 filters of stride 4, followed by a layer with with 32 4x4 filters of stride 2. The fully connected layer has 256 hidden units. The dimensionality of the embedding vectors, w, is set as k = 16. In the experiments we set r = 10, and this was also used as the predictions horizon, c.