Hierarchical Policy Search via Return-Weighted Density Estimation
Authors: Takayuki Osa, Masashi Sugiyama
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Results To visualize the performance, we first evaluate HPSDE in toy problems and the puddle world task where the return functions are multi-modal. Subsequently, we show the experiments with the motion planning task for a robotic manipulator, which is a practical application of hierarchical RL. |
| Researcher Affiliation | Academia | Takayuki Osa University of Tokyo 277-0882, Chiba, Japan RIKEN Center for AIP 103-0027, Tokyo, Japan Masashi Sugiyama RIKEN Center for AIP 103-0027, Tokyo, Japan University of Tokyo 277-0882, Chiba, Japan |
| Pseudocode | Yes | Algorithm 1 Hierarchical Policy Search via Return Weighted Density Estimation (HPSDE) |
| Open Source Code | No | The paper does not provide any statements about code release, nor does it include links to a source code repository. |
| Open Datasets | No | The paper discusses task setups such as the 'puddle world task' and 'motion planning for a redundant manipulator' in a 'simulation environment, developed based on VREP', but it does not mention using or providing access to any publicly available or open datasets with proper citations or links. |
| Dataset Splits | No | The paper does not provide specific details regarding training, validation, or test dataset splits, such as percentages, absolute sample counts, or citations to predefined splits. |
| Hardware Specification | No | The paper mentions running simulations and modeling a 'KUKA Light Weight Robot', but it does not provide any specific details about the hardware (e.g., CPU, GPU models, cloud instance types) used to conduct these experiments. |
| Software Dependencies | No | The paper mentions various software components and methods such as 'VREP', 'DMPs', 'Gaussian Process (GP)', 'REPS', 'RWR', and 'Hi REPS', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | For this task, we used a linear feature function φ(s) = [s , 1] and set Omax = 10 for HPSDE. [...] We set Omax = 20 for HPSDE. |