reproducibilityindex.ai

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Authors: Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Then, value function approximators are proposed for both upperand lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.
Researcher Affiliation	Academia	Mathieu Fehr1, Olivier Buffet2, Vincent Thomas2, Jilles Dibangoye3 1 École Normale Supérieure de la rue d Ulm, Paris, France 2 Université de Lorraine, CNRS, Inria, LORIA, Nancy, France 3 Université de Lyon, INSA Lyon, Inria, CITI, Lyon, France
Pseudocode	Yes	Algorithm 1: Heuristic Search Value Iteration & Inc-lc-HSVI
Open Source Code	Yes	Full code available here: https://gitlab.inria.fr/buffet/lc-hsvi-nips18
Open Datasets	Yes	The former problems a diverse set taken from Cassandra s POMDP page3
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits. It mentions running algorithms on benchmark problems and setting an epsilon threshold for convergence, but not how data was partitioned into splits.
Hardware Specification	Yes	The Java program4 is run on an i5 CPU M540 at 2.53GHz.
Software Dependencies	No	The paper mentions a 'Java program' but does not specify its version or any other software dependencies with version numbers required for reproducibility.
Experiment Setup	Yes	We run x-HSVI (x {pwlc, pw, lc, inc-lc}) on all benchmark problems with the exception of pwlc-HSVI not being run on ρ-POMDPs setting ϵ = 0.1 and a timeout of 600s. In inc-lc-HSVI, λ is initially set to 1. L and U are initialized (i) for POMDPs, using HSVI1 s blind estimate and MDP estimate, and (ii) for ρ-POMDPs, using Rmin / (1 − γ) and Rmax / (1 − γ).