Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling

Authors: Thomy Phan, Lenz Belzner, Marie Kiermeier, Markus Friedrich, Kyrill Schmid, Claudia Linnhoff-Popien7941-7948

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.
Researcher Affiliation Collaboration Thomy Phan LMU Munich thomy.phan@ifi.lmu.de Lenz Belzner Maiborn Wolff lenz.belzner@maibornwolff.de Marie Kiermeier LMU Munich marie.kiermeier@ifi.lmu.de Markus Friedrich LMU Munich markus.friedrich@ifi.lmu.de Kyrill Schmid LMU Munich kyrill.schmid@ifi.lmu.de Claudia Linnhoff-Popien LMU Munich linnhoff@ifi.lmu.de
Pseudocode Yes Algorithm 1 Generalized Thompson Sampling procedure Thompson Sampling(Nt) for at A do Infer µ1, λ1, α1, β1 from prior and Xat, σ2 a, nat µat, τat NG(µ1, λ1, α1, β1) return argmaxat A(µat) procedure Update Bandit(Nt, Gt) nat nat + 1 Xold,at, Xat Xat, (nat Xold,at +Gt)/(nat +1) sat [(nat 1)sat +(Gt Xold,at)(Gt Xat)]/nat
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for the source code of the methodology described.
Open Datasets Yes We tested POSTS in the Rock Sample, Battleship, and Poc Man domains, which are well-known POMDP benchmark problems for decision making in POMDPs (Silver and Veness 2010; Somani et al. 2013; Bai et al. 2014).
Dataset Splits No The paper describes experiments in simulation environments (Rock Sample, Battleship, Poc Man) for POMDP planning, not traditional machine learning dataset splits. As such, it does not specify explicit training/validation/test dataset splits.
Hardware Specification No The paper mentions computational and memory resources but does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies No The paper describes algorithms (e.g., UCB1, Thompson Sampling, POMCP) and custom implementations (POOLTS, POSTS) but does not list specific software dependencies with version numbers.
Experiment Setup Yes For each domain, we set the discount factor γ as proposed in (Silver and Veness 2010). For POMCP and POOLTS we set the UCB1 exploration constant c to the reward range of each domain as proposed in (Silver and Veness 2010). We focus on uninformative priors with µ0 = 0, α0 = 1, and λ0 = 0.01 as proposed in (Bai et al. 2014). With this setting, β0 controls the degree of initial exploration during the planning phase, thus its impact on the performance of POOLTS and POSTS is evaluated. The results are shown in Fig. 2 for β0 = 1000, 4000, 32000 for POOLTS and POSTS. The results are shown in Fig. 3 for nb = 4096 1 and β0 = 1000, 4000, 32000 for POOLTS and POSTS. The results are shown in Fig. 4 for nb = 4096, T = 100, and β0 = 1000, 4000, 32000 for POOLTS and POSTS.