rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Authors: Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Then, value function approximators are proposed for both upperand lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.
Researcher Affiliation Academia Mathieu Fehr1, Olivier Buffet2, Vincent Thomas2, Jilles Dibangoye3 1 École Normale Supérieure de la rue d Ulm, Paris, France 2 Université de Lorraine, CNRS, Inria, LORIA, Nancy, France 3 Université de Lyon, INSA Lyon, Inria, CITI, Lyon, France
Pseudocode Yes Algorithm 1: Heuristic Search Value Iteration & Inc-lc-HSVI
Open Source Code Yes Full code available here: https://gitlab.inria.fr/buffet/lc-hsvi-nips18
Open Datasets Yes The former problems a diverse set taken from Cassandra s POMDP page3
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits. It mentions running algorithms on benchmark problems and setting an epsilon threshold for convergence, but not how data was partitioned into splits.
Hardware Specification Yes The Java program4 is run on an i5 CPU M540 at 2.53GHz.
Software Dependencies No The paper mentions a 'Java program' but does not specify its version or any other software dependencies with version numbers required for reproducibility.
Experiment Setup Yes We run x-HSVI (x {pwlc, pw, lc, inc-lc}) on all benchmark problems with the exception of pwlc-HSVI not being run on ρ-POMDPs setting ϵ = 0.1 and a timeout of 600s. In inc-lc-HSVI, λ is initially set to 1. L and U are initialized (i) for POMDPs, using HSVI1 s blind estimate and MDP estimate, and (ii) for ρ-POMDPs, using Rmin / (1 − γ) and Rmax / (1 − γ).