Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing
Authors: Yuxuan Xie, Jilles Dibangoye, Olivier Buffet
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To support our findings, we provide three variants of the Heuristic Search Value Iteration (HSVI) algorithm (Smith, 2007) using either PWLC or linear value-function representations and compare them on standard problems from the literature. ... In all tested benchmarks, HSVI1 outperforms both HSVI2 and HSVI3, providing near-optimal (if not optimal) values at the initial state. |
| Researcher Affiliation | Academia | 1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2Universit e de Lorraine, INRIA, CNRS, LORIA, F-54000 Nancy, France. |
| Pseudocode | Yes | Algorithm 1: The HSVI Algorithm for bo MDP M. |
| Open Source Code | Yes | The software and data we used to generate the experiments are available at https://gitlab.inria.fr/jdibango/osisdec-pomdps. |
| Open Datasets | No | The paper states 'All used domains are available at masplan.org' but does not provide direct links, DOIs, specific repository names for each dataset, or formal citations with authors and year for the individual datasets. |
| Dataset Splits | No | The paper does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to standard predefined splits with citations). |
| Hardware Specification | Yes | We ran our variants of HSVI algorithm on an Ubuntu machine with 3.0GHz Xeon E5 CPU and 32GB available RAM. |
| Software Dependencies | No | The paper states 'We solved the MILPs using ILOG CPLEX Optimization Studio,' but does not specify a version number for the software. |
| Experiment Setup | Yes | For each of them we compare our variants of HSVI for planning horizon ℓ= 10 and discount factor γ = 1 and report different statistics, i.e. time, memory, number of trials, value, and gap. We set the time limit at 5 hours. |