reproducibilityindex.ai

An Efficient Approach to Model-Based Hierarchical Reinforcement Learning

Authors: Zhuoru Li, Akshay Narayan, Tze-Yun Leong

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the framework on common benchmark problems and complex simulated robotic environments. It compares favorably against the stateof-the-art algorithms, and scales well in very large problems. Experiments We test the empirical performance of CSRL on a set of benchmark experiments, formulated as a robot HRL agent solving different tasks.
Researcher Affiliation	Collaboration	School of Computing, National University of Singapore School of Information Systems, Singapore Management University lizhuoru@google.com, {anarayan, leongty}@comp.nus.edu.sg, leongty@smu.edu.sg Currently afﬁliated with Google Korea, LLC
Pseudocode	Yes	Algorithm 1 CSRL Algorithm, Algorithm 2 Construct SMDP(current_state), Algorithm 3 Simulate Task(s, i)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	This is a variant of the HRL benchmark Taxi problem (Dietterich 1998). We use the 10x10 grid world from Diuk et al. (Diuk, Cohen, and Littman 2008).
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits or percentages. It discusses experiments in terms of 'episodes' for reinforcement learning.
Hardware Specification	Yes	The running time is the average of 10 independent runs, on a Xeon E5-2643 v2 3.50GHz using a single thread.
Software Dependencies	No	The paper mentions 'Webots (Michel 2004) simulator' but does not provide specific version numbers for it or any other software libraries or dependencies used.
Experiment Setup	Yes	In all experiments, an episode terminates if it does not complete in 1000 steps. We set the exploration threshold, m = 1 for all methods. Since R-MAXQ cannot converge with m = 1, we set m = 5 like other existing works (Jong and Stone 2008; Cao and Ray 2012). The reward for navigation actions and opening doors is -1. The reward for the tasks unique actions is 40 if it completes the task, and -5 for attempting actions at wrong locations.