Diversity-Driven Extensible Hierarchical Reinforcement Learning

Authors: Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Mai Xu4992-4999

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental studies evaluate DEHRL with nine baselines from four perspectives in two domains; the results show that DEHRL outperforms the state-of-the-art baselines in all four aspects.
Researcher Affiliation Academia 1Department of Computer Science, University of Oxford, UK 2School of Electronic and Information Engineering, Beihang University, China 3State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, China
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Easy-to-run codes have been released to further clarify the details and facilitate future research1, where evaluations and visualizations on more domains, such as Montezuma s Revenge and Py Bullet (an open source alternative of Mu Jo Co), can also be found. 1https://github.com/Yuhang Song/DEHRL
Open Datasets No The paper uses game environments (Over Cooked, Minecraft) which are not explicitly referred to as publicly available datasets with concrete access information (link, citation, repository).
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or references to predefined splits) for training, validation, or testing.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions the PPO algorithm and deep neural networks but does not provide specific version numbers for software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes The important hyper-parameters of DEHRL are summarized in Table 1, while other details (e.g., neural network architectures and hyper-parameters in the policy training algorithm) are provided in (Song et al. 2018a). Table 1: The settings of DEHRL. A0 A1 A2 T0 T1 T2 16 5 5 1 1*4 1*4*12