Learning in POMDPs with Monte Carlo Tree Search
Authors: Sammie Katt, Frans A. Oliehoek, Christopher Amato
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Empirical Evaluation. We conducted an empirical evaluation with aimed for 3 goals: The first goal attempts to support the claims made in Section 4 and show that the adaptations to BA-POMCP do not decrease the quality of the resulting policies. Second, we investigate the runtime of those modifications to demonstrate their contribution to the efficiency of BAPOMCP. The last part contains experiments that directly compare the performance per action selection time with the baseline approach of Ross et al. (2011). |
| Researcher Affiliation | Academia | Sammie Katt 1 Frans A. Oliehoek 2 Christopher Amato 1 1Northeastern University, Boston, Massachusetts,USA 2University of Liverpool, UK. |
| Pseudocode | Yes | Algorithm 1 BA-POMCP( b,num sims), Algorithm 2 SIMULATE( s, d, h), Algorithm 3 BA-POMCP-STEP( s = s, χ , a), Algorithm 4 R-BA-POMCP-STEP ( s = s, χ , a), Algorithm 5 E-BA-POMCP-STEP( s = s, χ , a), Algorithm 6 L-BA-POMCP-STEP(sl = s, l, δ , a) |
| Open Source Code | No | No explicit statement providing concrete access to the source code for the methodology described in this paper was found. |
| Open Datasets | No | The paper describes using the 'classical Tiger problem' and a 'partially observable extension to the Sysadmin problem', which appear to be simulated environments or problem setups rather than publicly available datasets with concrete access information. |
| Dataset Splits | No | No specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits) were provided. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided. |
| Software Dependencies | No | No specific software dependencies, libraries, or solvers with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | Table 1: Default experiment parameters which lists 'Parameter Value γ 0.95 horizon (h) 20 # particles in belief 1000 exploration const h (max(R) min(R)) # episodes 100 λ = # updated counts 30' and 'for each count c, we take the true probability of that transition (called p) and (randomly) either subtract or add .15. Note that we do not allow transitions with nonzero probability to fall below 0 by setting those counts to 0.001. Each Dirichlet distribution is then normalized the counts to sum to 20.' |