Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Act in Decentralized Partially Observable MDPs

Authors: Jilles Dibangoye, Olivier Buffet

ICML 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show our approach can learn to act near-optimally in many ๏ฌnite domains from the literature.
Researcher Affiliation Academia 1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2INRIA / Universit e de Lorraine, Nancy, France.
Pseudocode Yes Algorithm 1 The o SARSA Algorithm
Open Source Code No The paper does not provide explicit access (link or statement of availability) to the source code for the methodology it describes.
Open Datasets Yes We evaluate our algorithm on multiple 2-agent benchmarks from the literature all available at masplan.org: Mabc, Recycling, Gridsmall, Grid3x3corners, Boxpushing, and Tiger.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification Yes We ran the o SARSA algorithm on a Mac OSX machine with 3.8GHz Core i5 and 8GB of available RAM.
Software Dependencies No We solved the MILPs using ILOG CPLEX Optimization Studio. However, a specific version number for CPLEX is not provided.
Experiment Setup Yes For REINFORCE and o SARSA, we used hyper-parameters ฯต and ฮฒ ranging from 1 to 10 3 with a decaying factor of 104, sample size |D| 104. We use maximum episodes and time limit 105 and 5 hours, respectively, as our stopping criteria.