rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions
Authors: Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Then, value function approximators are proposed for both upperand lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems. |
| Researcher Affiliation | Academia | Mathieu Fehr1, Olivier Buffet2, Vincent Thomas2, Jilles Dibangoye3 1 École Normale Supérieure de la rue d Ulm, Paris, France 2 Université de Lorraine, CNRS, Inria, LORIA, Nancy, France 3 Université de Lyon, INSA Lyon, Inria, CITI, Lyon, France |
| Pseudocode | Yes | Algorithm 1: Heuristic Search Value Iteration & Inc-lc-HSVI |
| Open Source Code | Yes | Full code available here: https://gitlab.inria.fr/buffet/lc-hsvi-nips18 |
| Open Datasets | Yes | The former problems a diverse set taken from Cassandra s POMDP page3 |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits. It mentions running algorithms on benchmark problems and setting an epsilon threshold for convergence, but not how data was partitioned into splits. |
| Hardware Specification | Yes | The Java program4 is run on an i5 CPU M540 at 2.53GHz. |
| Software Dependencies | No | The paper mentions a 'Java program' but does not specify its version or any other software dependencies with version numbers required for reproducibility. |
| Experiment Setup | Yes | We run x-HSVI (x {pwlc, pw, lc, inc-lc}) on all benchmark problems with the exception of pwlc-HSVI not being run on ρ-POMDPs setting ϵ = 0.1 and a timeout of 600s. In inc-lc-HSVI, λ is initially set to 1. L and U are initialized (i) for POMDPs, using HSVI1 s blind estimate and MDP estimate, and (ii) for ρ-POMDPs, using Rmin / (1 − γ) and Rmax / (1 − γ). |