Latent Space Policies for Hierarchical Reinforcement Learning

Authors: Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
Researcher Affiliation Academia 1Berkeley Artificial Intelligence Research, University of California, Berkeley, USA 2Independent researcher, Seattle, WA, USA. Correspondence to: Tuomas Haarnoja <haarnoja@berkeley.edu>, Kristian Hartikainen <kristian.hartikainen@gmail.com>.
Pseudocode Yes Algorithm 1 Latent Space Policy Learning
Open Source Code Yes We have released our code for reproducibility.2
Open Datasets Yes Our experiments were conducted on several continuous control benchmark tasks from the Open AI Gym benchmark suite (Brockman et al., 2016).
Dataset Splits No The paper uses standard benchmark tasks from OpenAI Gym but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction in the main text.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory, cloud resources) used to run its experiments.
Software Dependencies No The paper mentions using 'soft actor-critic' and 'Open AI Gym' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper describes the general architecture and experimental procedures but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations in the main text.