reproducibilityindex.ai

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Authors: Carlos Florensa, Yan Duan, Pieter Abbeel

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments1 show2 that this combination is effective in learning a wide span of interpretable skills in a sample-efﬁcient way, and can signiﬁcantly boost the learning performance uniformly across a wide range of downstream tasks.
Researcher Affiliation	Collaboration	Carlos Florensa , Yan Duan , Pieter Abbeel UC Berkeley, Department of Electrical Engineering and Computer Science International Computer Science Institute Open AI florensa@berkeley.edu, {rocky,pieter}@openai.com
Pseudocode	Yes	Algorithm 1: Skill training for SNNs with MI bonus
Open Source Code	Yes	1Code available at: https://github.com/florensacc/snn4hrl
Open Datasets	Yes	We have applied our framework to the two hierarchical tasks described in the benchmark by Duan et al. (2016): Locomotion + Maze and Locomotion + Food Collection (Gather).
Dataset Splits	No	The paper describes pre-training and downstream tasks, and uses terms like 'batch size' and 'maximum path length' related to online data collection in reinforcement learning, but does not provide explicit training, validation, and test dataset splits in the traditional sense of fixed data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions 'TRPO' as the policy optimization algorithm but does not provide specific software dependencies like library names with version numbers (e.g., 'Python 3.x', 'PyTorch x.x') that are needed to replicate the experiment.
Experiment Setup	Yes	All policies are trained with TRPO with step size 0.01 and discount 0.99. All neural networks (each of the Multi-policy ones, the SNN and the Manager Network) have 2 layers of 32 hidden units. For the SNN training, the mesh density used to grid the (x, y) space and give the MI bonus is 10 divisions/unit. The number of skills trained (ie dimension of latent variable in the SNN or number of independently trained policies in the Mulit-policy setup) is 6. The batch size and the maximum path length for the pre-train task are also the ones used in the benchmark (Duan et al., 2016): 50,000 and 500 respectively. For the downstream tasks, see Tab. 1.