reproducibilityindex.ai

Hierarchical Monte-Carlo Planning

Authors: Ngo Anh Vien, Marc Toussaint

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the hierarchical MCTS methods on various settings such as a hierarchical MDP, a Bayesian model-based hierarchical RL problem, and a large hierarchical POMDP.
Researcher Affiliation	Academia	Ngo Anh Vien Machine Learning and Robotics Lab University of Stuttgart, Germany vien.ngo@ipvs.uni-stuttgart.de Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart, Germany marc.toussaint@ipvs.uni-stuttgart.de
Pseudocode	Yes	Algorithm 1 MAIN and EXECUTE procedures, Algorithm 2 H-UCT, Algorithm 3 ROLLOUT, Algorithm 4 SIMULATE.
Open Source Code	No	The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets	No	The paper refers to problem 'domains' (e.g., 5x5 Taxi, 10x10 Taxi, Pac-Man) which are environments for reinforcement learning, rather than distinct datasets with specified public availability or citations. No concrete access information like links, DOIs, repositories, or formal citations for publicly available datasets is provided.
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits). It describes parameters for running experiments (e.g., number of samples, episodes, discount factor) within the reinforcement learning domains.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or library names with version numbers needed to replicate the experiment.
Experiment Setup	Yes	For UCT and H-UCT we use 1000 samples (5 5 map) and 2000 episodes (10 10 map), a discount factor of 0.995, and report a mean and its ﬁrst deviation of the average cumulative rewards of 100 episodes (5 5 map) and 200 episodes (10 10 map) from 10 runs. and The discount factor γ is set to 0.99. and We use particles to represent beliefs and use the similar particle invigoration method as in (Silver and Veness 2010).