Hierarchical Policy Search via Return-Weighted Density Estimation

Authors: Takayuki Osa, Masashi Sugiyama

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental Results To visualize the performance, we first evaluate HPSDE in toy problems and the puddle world task where the return functions are multi-modal. Subsequently, we show the experiments with the motion planning task for a robotic manipulator, which is a practical application of hierarchical RL.
Researcher Affiliation Academia Takayuki Osa University of Tokyo 277-0882, Chiba, Japan RIKEN Center for AIP 103-0027, Tokyo, Japan Masashi Sugiyama RIKEN Center for AIP 103-0027, Tokyo, Japan University of Tokyo 277-0882, Chiba, Japan
Pseudocode Yes Algorithm 1 Hierarchical Policy Search via Return Weighted Density Estimation (HPSDE)
Open Source Code No The paper does not provide any statements about code release, nor does it include links to a source code repository.
Open Datasets No The paper discusses task setups such as the 'puddle world task' and 'motion planning for a redundant manipulator' in a 'simulation environment, developed based on VREP', but it does not mention using or providing access to any publicly available or open datasets with proper citations or links.
Dataset Splits No The paper does not provide specific details regarding training, validation, or test dataset splits, such as percentages, absolute sample counts, or citations to predefined splits.
Hardware Specification No The paper mentions running simulations and modeling a 'KUKA Light Weight Robot', but it does not provide any specific details about the hardware (e.g., CPU, GPU models, cloud instance types) used to conduct these experiments.
Software Dependencies No The paper mentions various software components and methods such as 'VREP', 'DMPs', 'Gaussian Process (GP)', 'REPS', 'RWR', and 'Hi REPS', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes For this task, we used a linear feature function φ(s) = [s , 1] and set Omax = 10 for HPSDE. [...] We set Omax = 20 for HPSDE.