Perception-Aware Point-Based Value Iteration for Partially Observable Markov Decision Processes

Authors: Mahsa Ghasemi, Ufuk Topcu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the proposed algorithm for active perception and planning, we implement the point-based value iteration solver for AP2-POMDPs. We initialize the belief set by uniform sampling from B [Devroye, 1986]. To focus on the effect of perception, we keep the belief set fixed throughout the iterations. However, one can incorporate any sampling method such as the ones proposed by Kurniawati et al. [2008], and Smith and Simmons [2012]. The α vectors are initialized by 1 1 γ mins,a R(s, a).Ones(|S|) [Shani et al., 2013]. Furthermore, to speedup the solver, one can employ a randomized backup step, as suggested by Spaan and Vlassis [2005]. The solver terminates once the difference between value functions in two consecutive iterations falls below a predefined threshold. We also implemented a random perception policy that selects a subset of information sources, uniformly at random, at each backup step.5.1 Robotic Navigation in 1-D Grid... We evaluate the computed policy by running 1000 Monte Carlo simulations. The robot starts at the origin and its initial belief is uniform. Figure 4-(a) demonstrates the discounted cumulative reward, averaged over 1000 runs, for random selection of 1 and 2 cameras, and greedy selection of 1 and 2 cameras.
Researcher Affiliation Academia Mahsa Ghasemi and Ufuk Topcu University of Texas at Austin {mahsa.ghasemi, utopcu}@utexas.edu
Pseudocode Yes Algorithm 1 Greedy policy for perception action... Algorithm 2 Generic algorithm for point-based solvers [Araya et al., 2010]... Algorithm 3 Back Up step for AP2-POMDP
Open Source Code No The paper does not provide any links to open-source code or explicitly state that code for their methodology is released.
Open Datasets No The paper describes simulation scenarios (1-D and 2-D grids) and mentions initializing belief sets and running Monte Carlo simulations. It does not refer to a publicly available pre-existing dataset that was used for training or evaluation.
Dataset Splits No The paper describes running Monte Carlo simulations to evaluate the policy but does not specify training, validation, or test dataset splits in the conventional sense (e.g., percentages or sample counts for data partitions).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper describes the algorithms and their implementation but does not list any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We initialize the belief set by uniform sampling from B [Devroye, 1986]. ... The α vectors are initialized by 1 1 γ mins,a R(s, a).Ones(|S|) [Shani et al., 2013]. ... The solver terminates once the difference between value functions in two consecutive iterations falls below a predefined threshold. ... The robot starts at the origin and its initial belief is uniform. ... The reward is 10 at the goal state, -4 at the obstacles, and -1 in other states.