Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes

Authors: Pascal Poupart, Aarti Malhotra, Pei Pei, Kee-Eung Kim, Bongseok Goh, Michael Bowling

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we describe a technique based on approximate linear programming to optimize policies in CPOMDPs. The approach outperforms a constrained version of point-based value iteration on a suite of benchmark problems.
Researcher Affiliation Academia David R. Cheriton School of Computer Science, University of Waterloo, Canada Department of Computer Science, Korean Advanced Institute of Science and Technology, Korea Department of Computing Science, University of Alberta, Canada
Pseudocode Yes Algorithm 1 Approximate LP; Algorithm 2 Iterative Approximate Linear Programming; Algorithm 3 Belief Generation
Open Source Code No The paper states that CALP and CPBVI are implemented in Matlab, but does not provide any links or explicit statements about the code being open-sourced.
Open Datasets Yes The toy, qcd and ncity problems are CPOMDP benchmarks from (Kim et al. 2011). The remaining problems are single objective POMDPs from http://www.pomdp.org that we augmented with costs.
Dataset Splits No The paper describes evaluation through simulations and convergence criteria, but it does not specify explicit train/validation/test dataset splits in the conventional sense of partitioning a static dataset.
Hardware Specification Yes Both CPBVI and CALP are implemented in Matlab without any explicit parallelization (other than the automated parallelization performed by Matlab s virtual machine) and were run on a 64bit Cent OS Linux machine with 24 cores (1.2 GHz), 132 Gb of RAM, Matlab R2013b and CPLEX 12.6.
Software Dependencies Yes Matlab R2013b and CPLEX 12.6.
Experiment Setup Yes The algorithms ran until convergence of the optimal value to 3 significant digits or a time limit of 1000 seconds (which ever occured first). The expected reward and cost were computed by running 1000 simulations of 1000 steps for CPBVI.