Learning in POMDPs with Monte Carlo Tree Search

Authors: Sammie Katt, Frans A. Oliehoek, Christopher Amato

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Empirical Evaluation. We conducted an empirical evaluation with aimed for 3 goals: The first goal attempts to support the claims made in Section 4 and show that the adaptations to BA-POMCP do not decrease the quality of the resulting policies. Second, we investigate the runtime of those modifications to demonstrate their contribution to the efficiency of BAPOMCP. The last part contains experiments that directly compare the performance per action selection time with the baseline approach of Ross et al. (2011).
Researcher Affiliation Academia Sammie Katt 1 Frans A. Oliehoek 2 Christopher Amato 1 1Northeastern University, Boston, Massachusetts,USA 2University of Liverpool, UK.
Pseudocode Yes Algorithm 1 BA-POMCP( b,num sims), Algorithm 2 SIMULATE( s, d, h), Algorithm 3 BA-POMCP-STEP( s = s, χ , a), Algorithm 4 R-BA-POMCP-STEP ( s = s, χ , a), Algorithm 5 E-BA-POMCP-STEP( s = s, χ , a), Algorithm 6 L-BA-POMCP-STEP(sl = s, l, δ , a)
Open Source Code No No explicit statement providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets No The paper describes using the 'classical Tiger problem' and a 'partially observable extension to the Sysadmin problem', which appear to be simulated environments or problem setups rather than publicly available datasets with concrete access information.
Dataset Splits No No specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits) were provided.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies No No specific software dependencies, libraries, or solvers with version numbers were mentioned in the paper.
Experiment Setup Yes Table 1: Default experiment parameters which lists 'Parameter Value γ 0.95 horizon (h) 20 # particles in belief 1000 exploration const h (max(R) min(R)) # episodes 100 λ = # updated counts 30' and 'for each count c, we take the true probability of that transition (called p) and (randomly) either subtract or add .15. Note that we do not allow transitions with nonzero probability to fall below 0 by setting those counts to 0.001. Each Dirichlet distribution is then normalized the counts to sum to 20.'