reproducibilityindex.ai

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Authors: Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results on a number of difﬁcult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods.12
Researcher Affiliation	Collaboration	Oﬁr Nachum, Shixiang Gu, Honglak Lee & Sergey Levine Google Brain {ofirnachum,shanegu,honglak,slevine}@google.com Also at UC Berkeley.
Pseudocode	Yes	A pseudocode of the full algorithm is presented in the Appendix (see Algorithm 1).
Open Source Code	Yes	Find open-source code at https://github.com/tensorflow/models/tree/master/ research/efficient-hrl
Open Datasets	Yes	We evaluate on the following continuous-control Mu Jo Co (Todorov et al., 2012) tasks (see Appendix C for details):
Dataset Splits	No	The paper describes how training and evaluation tasks are set up in continuous control environments, rather than providing explicit data splits for a static dataset, which is common in RL but does not meet the strict definition of dataset splits.
Hardware Specification	No	The paper does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software components or libraries used.
Experiment Setup	Yes	We use a Huber function for D, the distance function used to compute the low-level reward. We use a goal dimension of size 2. We train the higher-level policy to output actions in [ 10, 10]2. We use a Gaussian with standard deviation 5 for high-level exploration. We parameterize fθ with a feed-forward neural network with two hidden layers of dimension 100 using relu activations. The network structure for ϕθ is identical, except using hidden layer dimensions 400 and 300. These networks are trained with the Adam optimizer using learning rate 0.0001.