Learning to Act in Decentralized Partially Observable MDPs

Authors: Jilles Dibangoye, Olivier Buffet

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show our approach can learn to act near-optimally in many finite domains from the literature.
Researcher Affiliation Academia 1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2INRIA / Universit e de Lorraine, Nancy, France.
Pseudocode Yes Algorithm 1 The o SARSA Algorithm
Open Source Code No The paper does not provide explicit access (link or statement of availability) to the source code for the methodology it describes.
Open Datasets Yes We evaluate our algorithm on multiple 2-agent benchmarks from the literature all available at masplan.org: Mabc, Recycling, Gridsmall, Grid3x3corners, Boxpushing, and Tiger.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification Yes We ran the o SARSA algorithm on a Mac OSX machine with 3.8GHz Core i5 and 8GB of available RAM.
Software Dependencies No We solved the MILPs using ILOG CPLEX Optimization Studio. However, a specific version number for CPLEX is not provided.
Experiment Setup Yes For REINFORCE and o SARSA, we used hyper-parameters ϵ and β ranging from 1 to 10 3 with a decaying factor of 104, sample size |D| 104. We use maximum episodes and time limit 105 and 5 hours, respectively, as our stopping criteria.