Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Point-Based Value Iteration for Finite-Horizon POMDPs

Authors: Erwin Walraven, Matthijs T. J. Spaan

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.
Researcher Affiliation Academia Erwin Walraven EMAIL Matthijs T. J. Spaan EMAIL Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
Pseudocode Yes Algorithm 1: Sawtooth approximation (UB), Algorithm 2: Finite-horizon point-based Value Iteration (Fi VI), Algorithm 3: Belief expansion (expand), Algorithm 4: Perseus Belief Selection (PBS)
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use multiple domains from pomdp.org, which we solve with horizons h = 5, 10, 15, 20.
Dataset Splits No The paper uses POMDP domains to test the algorithms, which are problem definitions rather than datasets requiring explicit training/test/validation splits. No specific dataset split information is provided for reproducibility in terms of data partitioning for learning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., programming languages, libraries, or external solvers) used to implement and run the described algorithms.
Experiment Setup Yes We let the algorithms run for at most 15 minutes, after which execution is terminated. Furthermore, we stop algorithm execution if the gap between the lower bound and upper bound drops below 0.01. For DBBU we consider the parameters θ = 10, 20, 30, 40. We let the algorithm sample beliefs during 1000 episodes. During our experiments we use discretization parameter D = 10. The third method we consider is the infinite-horizon algorithm Gap Min which computes an infinite-horizon policy with γ = 0.99.