reproducibilityindex.ai

Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes

Authors: Pascal Poupart, Aarti Malhotra, Pei Pei, Kee-Eung Kim, Bongseok Goh, Michael Bowling

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we describe a technique based on approximate linear programming to optimize policies in CPOMDPs. The approach outperforms a constrained version of point-based value iteration on a suite of benchmark problems.
Researcher Affiliation	Academia	David R. Cheriton School of Computer Science, University of Waterloo, Canada Department of Computer Science, Korean Advanced Institute of Science and Technology, Korea Department of Computing Science, University of Alberta, Canada
Pseudocode	Yes	Algorithm 1 Approximate LP; Algorithm 2 Iterative Approximate Linear Programming; Algorithm 3 Belief Generation
Open Source Code	No	The paper states that CALP and CPBVI are implemented in Matlab, but does not provide any links or explicit statements about the code being open-sourced.
Open Datasets	Yes	The toy, qcd and ncity problems are CPOMDP benchmarks from (Kim et al. 2011). The remaining problems are single objective POMDPs from http://www.pomdp.org that we augmented with costs.
Dataset Splits	No	The paper describes evaluation through simulations and convergence criteria, but it does not specify explicit train/validation/test dataset splits in the conventional sense of partitioning a static dataset.
Hardware Specification	Yes	Both CPBVI and CALP are implemented in Matlab without any explicit parallelization (other than the automated parallelization performed by Matlab s virtual machine) and were run on a 64bit Cent OS Linux machine with 24 cores (1.2 GHz), 132 Gb of RAM, Matlab R2013b and CPLEX 12.6.
Software Dependencies	Yes	Matlab R2013b and CPLEX 12.6.
Experiment Setup	Yes	The algorithms ran until convergence of the optimal value to 3 signiﬁcant digits or a time limit of 1000 seconds (which ever occured ﬁrst). The expected reward and cost were computed by running 1000 simulations of 1000 steps for CPBVI.