Learning to Act in Decentralized Partially Observable MDPs
Authors: Jilles Dibangoye, Olivier Buffet
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show our approach can learn to act near-optimally in many finite domains from the literature. |
| Researcher Affiliation | Academia | 1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2INRIA / Universit e de Lorraine, Nancy, France. |
| Pseudocode | Yes | Algorithm 1 The o SARSA Algorithm |
| Open Source Code | No | The paper does not provide explicit access (link or statement of availability) to the source code for the methodology it describes. |
| Open Datasets | Yes | We evaluate our algorithm on multiple 2-agent benchmarks from the literature all available at masplan.org: Mabc, Recycling, Gridsmall, Grid3x3corners, Boxpushing, and Tiger. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | Yes | We ran the o SARSA algorithm on a Mac OSX machine with 3.8GHz Core i5 and 8GB of available RAM. |
| Software Dependencies | No | We solved the MILPs using ILOG CPLEX Optimization Studio. However, a specific version number for CPLEX is not provided. |
| Experiment Setup | Yes | For REINFORCE and o SARSA, we used hyper-parameters ϵ and β ranging from 1 to 10 3 with a decaying factor of 104, sample size |D| 104. We use maximum episodes and time limit 105 and 5 hours, respectively, as our stopping criteria. |