Hierarchical Monte-Carlo Planning
Authors: Ngo Anh Vien, Marc Toussaint
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the hierarchical MCTS methods on various settings such as a hierarchical MDP, a Bayesian model-based hierarchical RL problem, and a large hierarchical POMDP. |
| Researcher Affiliation | Academia | Ngo Anh Vien Machine Learning and Robotics Lab University of Stuttgart, Germany vien.ngo@ipvs.uni-stuttgart.de Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart, Germany marc.toussaint@ipvs.uni-stuttgart.de |
| Pseudocode | Yes | Algorithm 1 MAIN and EXECUTE procedures, Algorithm 2 H-UCT, Algorithm 3 ROLLOUT, Algorithm 4 SIMULATE. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available or provide a link to a code repository. |
| Open Datasets | No | The paper refers to problem 'domains' (e.g., 5x5 Taxi, 10x10 Taxi, Pac-Man) which are environments for reinforcement learning, rather than distinct datasets with specified public availability or citations. No concrete access information like links, DOIs, repositories, or formal citations for publicly available datasets is provided. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits (percentages, sample counts, or citations to predefined splits). It describes parameters for running experiments (e.g., number of samples, episodes, discount factor) within the reinforcement learning domains. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library names with version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | For UCT and H-UCT we use 1000 samples (5 5 map) and 2000 episodes (10 10 map), a discount factor of 0.995, and report a mean and its first deviation of the average cumulative rewards of 100 episodes (5 5 map) and 200 episodes (10 10 map) from 10 runs. and The discount factor γ is set to 0.99. and We use particles to represent beliefs and use the similar particle invigoration method as in (Silver and Veness 2010). |