Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games
Authors: Karel Hork, Branislav Bo_ansk_, Michal P_chou_ek
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the applicability and scalability of our algorithm on three different domains: pursuit-evasion, patrolling, and search games. The results show that our algorithm can closely approximate solutions of large games with more than 4000 states. We demonstrate application possibilities and scalability of our algorithm on three types of games: pursuit-evasion games (e.g., evaluated in (Horak and Bosansky 2016)), intrusion search games (e.g., see (Bosansky et al. 2014)), and patrolling games with a discount factor (e.g., see (Vorobeychik et al. 2014)). |
| Researcher Affiliation | Academia | Karel Hor ak, Branislav Boˇsansk y, Michal Pˇechouˇcek Department of Computer Science, Faculty of Electrical Engineering Czech Technical University in Prague {horak,bosansky,pechoucek}@agents.fel.cvut.cz |
| Pseudocode | Yes | Algorithm 1: HSVI algorithm for one-sided POSGs |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository. |
| Open Datasets | No | We demonstrate application possibilities and scalability of our algorithm on three types of games: pursuit-evasion games (e.g., evaluated in (Horak and Bosansky 2016)), intrusion search games (e.g., see (Bosansky et al. 2014)), and patrolling games with a discount factor (e.g., see (Vorobeychik et al. 2014)). Following the setting in (Vorobeychik et al. 2014), we focus on graphs generated from Erdos-Renyi model (Newman 2010) with parameter p = 0.25 (denoted ER(0.25)). (The paper refers to existing models and prior work as a basis for generating game instances but does not provide concrete access information for a specific, publicly available dataset used for training.) |
| Dataset Splits | No | Unless stated otherwise the discount factor is γ = 0.95 and we ran the algorithm until gap(ˆv(b0)) 1. (This specifies a stopping criterion for the algorithm's convergence, but the paper does not define explicit train/validation/test dataset splits, percentages, or absolute sample counts for data partitioning.) |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) are mentioned in the paper. |
| Experiment Setup | Yes | Unless stated otherwise the discount factor is γ = 0.95 and we ran the algorithm until gap(ˆv(b0)) 1. We initialize the value functions by solving the perfect-information refinement of the game (for v) and as a best response to a uniform strategy of player 1 (for v). We use standard value iteration for stochastic games, or MDPs, respectively, and terminate the initialization when either change in valuations between iterations is lower than 0.025, or 20s time limit has expired. Similarly to (Smith and Simmons 2004), we adjust ϵ in each iteration using formula ϵ = 0.25+η(gap(ˆv(b0)) 0.25) with η = 0.9. We set the neighborhood parameter R to the largest value satisfying ρ(t) 0.25γ t for all t tmax from the proof of Theorem 3. Finally, we remove dominated points and vectors from sets Γ and Υ whenever their size grows by 20% to reduce the size of the linear programs. |