Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Authors: Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods.12 |
| Researcher Affiliation | Collaboration | Ofir Nachum, Shixiang Gu, Honglak Lee & Sergey Levine Google Brain {ofirnachum,shanegu,honglak,slevine}@google.com Also at UC Berkeley. |
| Pseudocode | Yes | A pseudocode of the full algorithm is presented in the Appendix (see Algorithm 1). |
| Open Source Code | Yes | Find open-source code at https://github.com/tensorflow/models/tree/master/ research/efficient-hrl |
| Open Datasets | Yes | We evaluate on the following continuous-control Mu Jo Co (Todorov et al., 2012) tasks (see Appendix C for details): |
| Dataset Splits | No | The paper describes how training and evaluation tasks are set up in continuous control environments, rather than providing explicit data splits for a static dataset, which is common in RL but does not meet the strict definition of dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used. |
| Experiment Setup | Yes | We use a Huber function for D, the distance function used to compute the low-level reward. We use a goal dimension of size 2. We train the higher-level policy to output actions in [ 10, 10]2. We use a Gaussian with standard deviation 5 for high-level exploration. We parameterize fθ with a feed-forward neural network with two hidden layers of dimension 100 using relu activations. The network structure for ϕθ is identical, except using hidden layer dimensions 400 and 300. These networks are trained with the Adam optimizer using learning rate 0.0001. |