Markovian State and Action Abstractions for MDPs via Hierarchical MCTS

Authors: Aijun Bai, Siddharth Srivastava, Stuart Russell

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments, Figures 3a, 3e, 3b and 3f with x axis in log scale show the results of running UCT, UCT', POMCP(M, '), POMCP(M, ', O) and smart-POMCP(M, ', O) in ROOMS[17, 17, 4] and ROOMS[25, 13, 8] problems
Researcher Affiliation Collaboration Aijun Bai UC Berkeley aijunbai@berkeley.edu Siddharth Srivastava United Tech. Research Center srivass@utrc.utc.com Stuart Russell UC Berkeley russell@cs.berkeley.edu
Pseudocode Yes Figure 2: POMCP(M, ', O) Markovian state and action abstractions for MDPs via hierarchical MCTS. (This figure contains the pseudocode blocks).
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper uses custom problem domains ('ROOMS[m, n, k]' and 'C-ROOMS[m, n, k]') which are described, but no concrete access information (link, DOI, repository, or citation to an established public dataset) is provided.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers.
Experiment Setup Yes The discount factor is γ = 0.98. The maximal planning horizon is determined as H = blogγ c = 341, where is set to be 0.001.