reproducibilityindex.ai

FeUdal Networks for Hierarchical Reinforcement Learning

Authors: Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on a selection of ATARI games (including the infamous Montezuma s revenge) and on several memory tasks in the 3D Deep Mind Lab environment (Beattie et al., 2016) show that Fu N signiﬁcantly improves long-term credit assignment and memorisation. 5. Experiments The goal of our experiments is to demonstrate that Fu N learns non-trivial, helpful, and interpretable sub-policies and sub-goals, and also to validate components of the architecture.
Researcher Affiliation	Industry	1Deep Mind, London, United Kingdom. Correspondence to: Alexander Sasha Vezhnevets <vezhnick@google.com>.
Pseudocode	No	The paper describes algorithms and architectures using text and equations, but it does not contain structured pseudocode or algorithm blocks with explicit labels like 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper does not provide concrete access to source code (no specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	Yes	Our experiments on a selection of ATARI games (including the infamous Montezuma s revenge) and on several memory tasks in the 3D Deep Mind Lab environment (Beattie et al., 2016). Montezuma s revenge is one of the hardest games available through the ALE (Bellemare et al., 2012).
Dataset Splits	No	The paper describes training and evaluation on game environments and episodes, but it does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification	No	The paper states 'We trained our models using...' but does not provide specific hardware details (e.g., CPU/GPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software components and methods like 'A3C method (Mnih et al., 2016)' and 'shared RMSProp', but it does not provide specific version numbers for these or any other software dependencies (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup	Yes	Optimisation. We use the A3C method (Mnih et al., 2016) for all reinforcement learning experiments. We cut the trajectory and run backpropagation through time (BPTT) ... For Fu N K = 400, for LSTM, unless otherwise stated, K = 40. The optimization process runs 32 asynchronous threads using shared RMSProp. Learning rate and entropy penalty were sampled from a Log Uniform(10−4, 10−3) interval for LSTM. For Fu N the learning rate was sampled from Log Uniform(10−4.5, 10−3.5). We sample its weight α Uniform(0, 1). For all ATARI experiments we clip the reward to [−1, +1] interval. We use a small discount 0.99 for LSTM; for Fu N we use 0.99 in Worker and 0.999 in Manager. The perceptual module fpercept is a convolutional network (CNN) ... The CNN has a ﬁrst layer with 16 8x8 ﬁlters of stride 4, followed by a layer with with 32 4x4 ﬁlters of stride 2. The fully connected layer has 256 hidden units. The dimensionality of the embedding vectors, w, is set as k = 16. In the experiments we set r = 10, and this was also used as the predictions horizon, c.