Hierarchical Monte-Carlo Planning

Authors: Ngo Anh Vien, Marc Toussaint

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the hierarchical MCTS methods on various settings such as a hierarchical MDP, a Bayesian model-based hierarchical RL problem, and a large hierarchical POMDP.
Researcher Affiliation Academia Ngo Anh Vien Machine Learning and Robotics Lab University of Stuttgart, Germany vien.ngo@ipvs.uni-stuttgart.de Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart, Germany marc.toussaint@ipvs.uni-stuttgart.de
Pseudocode Yes Algorithm 1 MAIN and EXECUTE procedures, Algorithm 2 H-UCT, Algorithm 3 ROLLOUT, Algorithm 4 SIMULATE.
Open Source Code No The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets No The paper refers to problem 'domains' (e.g., 5x5 Taxi, 10x10 Taxi, Pac-Man) which are environments for reinforcement learning, rather than distinct datasets with specified public availability or citations. No concrete access information like links, DOIs, repositories, or formal citations for publicly available datasets is provided.
Dataset Splits No The paper does not provide specific train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits). It describes parameters for running experiments (e.g., number of samples, episodes, discount factor) within the reinforcement learning domains.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies or library names with version numbers needed to replicate the experiment.
Experiment Setup Yes For UCT and H-UCT we use 1000 samples (5 5 map) and 2000 episodes (10 10 map), a discount factor of 0.995, and report a mean and its first deviation of the average cumulative rewards of 100 episodes (5 5 map) and 200 episodes (10 10 map) from 10 runs. and The discount factor γ is set to 0.99. and We use particles to represent beliefs and use the similar particle invigoration method as in (Silver and Veness 2010).