Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Hierarchical Policy Search via Return-Weighted Density Estimation
Authors: Takayuki Osa, Masashi Sugiyama
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Results To visualize the performance, we ο¬rst evaluate HPSDE in toy problems and the puddle world task where the return functions are multi-modal. Subsequently, we show the experiments with the motion planning task for a robotic manipulator, which is a practical application of hierarchical RL. |
| Researcher Affiliation | Academia | Takayuki Osa University of Tokyo 277-0882, Chiba, Japan RIKEN Center for AIP 103-0027, Tokyo, Japan Masashi Sugiyama RIKEN Center for AIP 103-0027, Tokyo, Japan University of Tokyo 277-0882, Chiba, Japan |
| Pseudocode | Yes | Algorithm 1 Hierarchical Policy Search via Return Weighted Density Estimation (HPSDE) |
| Open Source Code | No | The paper does not provide any statements about code release, nor does it include links to a source code repository. |
| Open Datasets | No | The paper discusses task setups such as the 'puddle world task' and 'motion planning for a redundant manipulator' in a 'simulation environment, developed based on VREP', but it does not mention using or providing access to any publicly available or open datasets with proper citations or links. |
| Dataset Splits | No | The paper does not provide specific details regarding training, validation, or test dataset splits, such as percentages, absolute sample counts, or citations to predefined splits. |
| Hardware Specification | No | The paper mentions running simulations and modeling a 'KUKA Light Weight Robot', but it does not provide any specific details about the hardware (e.g., CPU, GPU models, cloud instance types) used to conduct these experiments. |
| Software Dependencies | No | The paper mentions various software components and methods such as 'VREP', 'DMPs', 'Gaussian Process (GP)', 'REPS', 'RWR', and 'Hi REPS', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | For this task, we used a linear feature function Ο(s) = [s , 1] and set Omax = 10 for HPSDE. [...] We set Omax = 20 for HPSDE. |