Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games

Authors: Karel Hor‡k, Branislav Bo_ansk_, Michal P_chou_ek

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the applicability and scalability of our algorithm on three different domains: pursuit-evasion, patrolling, and search games. The results show that our algorithm can closely approximate solutions of large games with more than 4000 states. We demonstrate application possibilities and scalability of our algorithm on three types of games: pursuit-evasion games (e.g., evaluated in (Horak and Bosansky 2016)), intrusion search games (e.g., see (Bosansky et al. 2014)), and patrolling games with a discount factor (e.g., see (Vorobeychik et al. 2014)).
Researcher Affiliation Academia Karel Hor ak, Branislav Boˇsansk y, Michal Pˇechouˇcek Department of Computer Science, Faculty of Electrical Engineering Czech Technical University in Prague {horak,bosansky,pechoucek}@agents.fel.cvut.cz
Pseudocode Yes Algorithm 1: HSVI algorithm for one-sided POSGs
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets No We demonstrate application possibilities and scalability of our algorithm on three types of games: pursuit-evasion games (e.g., evaluated in (Horak and Bosansky 2016)), intrusion search games (e.g., see (Bosansky et al. 2014)), and patrolling games with a discount factor (e.g., see (Vorobeychik et al. 2014)). Following the setting in (Vorobeychik et al. 2014), we focus on graphs generated from Erdos-Renyi model (Newman 2010) with parameter p = 0.25 (denoted ER(0.25)). (The paper refers to existing models and prior work as a basis for generating game instances but does not provide concrete access information for a specific, publicly available dataset used for training.)
Dataset Splits No Unless stated otherwise the discount factor is γ = 0.95 and we ran the algorithm until gap(ˆv(b0)) 1. (This specifies a stopping criterion for the algorithm's convergence, but the paper does not define explicit train/validation/test dataset splits, percentages, or absolute sample counts for data partitioning.)
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) are mentioned in the paper.
Experiment Setup Yes Unless stated otherwise the discount factor is γ = 0.95 and we ran the algorithm until gap(ˆv(b0)) 1. We initialize the value functions by solving the perfect-information refinement of the game (for v) and as a best response to a uniform strategy of player 1 (for v). We use standard value iteration for stochastic games, or MDPs, respectively, and terminate the initialization when either change in valuations between iterations is lower than 0.025, or 20s time limit has expired. Similarly to (Smith and Simmons 2004), we adjust ϵ in each iteration using formula ϵ = 0.25+η(gap(ˆv(b0)) 0.25) with η = 0.9. We set the neighborhood parameter R to the largest value satisfying ρ(t) 0.25γ t for all t tmax from the proof of Theorem 3. Finally, we remove dominated points and vectors from sets Γ and Υ whenever their size grows by 20% to reduce the size of the linear programs.