reproducibilityindex.ai

Learning to Act in Decentralized Partially Observable MDPs

Authors: Jilles Dibangoye, Olivier Buffet

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show our approach can learn to act near-optimally in many ﬁnite domains from the literature.
Researcher Affiliation	Academia	1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2INRIA / Universit e de Lorraine, Nancy, France.
Pseudocode	Yes	Algorithm 1 The o SARSA Algorithm
Open Source Code	No	The paper does not provide explicit access (link or statement of availability) to the source code for the methodology it describes.
Open Datasets	Yes	We evaluate our algorithm on multiple 2-agent benchmarks from the literature all available at masplan.org: Mabc, Recycling, Gridsmall, Grid3x3corners, Boxpushing, and Tiger.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	Yes	We ran the o SARSA algorithm on a Mac OSX machine with 3.8GHz Core i5 and 8GB of available RAM.
Software Dependencies	No	We solved the MILPs using ILOG CPLEX Optimization Studio. However, a specific version number for CPLEX is not provided.
Experiment Setup	Yes	For REINFORCE and o SARSA, we used hyper-parameters ϵ and β ranging from 1 to 10 3 with a decaying factor of 104, sample size \|D\| 104. We use maximum episodes and time limit 105 and 5 hours, respectively, as our stopping criteria.