Latent Space Policies for Hierarchical Reinforcement Learning
Authors: Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives. |
| Researcher Affiliation | Academia | 1Berkeley Artiļ¬cial Intelligence Research, University of California, Berkeley, USA 2Independent researcher, Seattle, WA, USA. Correspondence to: Tuomas Haarnoja <haarnoja@berkeley.edu>, Kristian Hartikainen <kristian.hartikainen@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Latent Space Policy Learning |
| Open Source Code | Yes | We have released our code for reproducibility.2 |
| Open Datasets | Yes | Our experiments were conducted on several continuous control benchmark tasks from the Open AI Gym benchmark suite (Brockman et al., 2016). |
| Dataset Splits | No | The paper uses standard benchmark tasks from OpenAI Gym but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction in the main text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, cloud resources) used to run its experiments. |
| Software Dependencies | No | The paper mentions using 'soft actor-critic' and 'Open AI Gym' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper describes the general architecture and experimental procedures but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations in the main text. |