Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Act in Decentralized Partially Observable MDPs
Authors: Jilles Dibangoye, Olivier Buffet
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show our approach can learn to act near-optimally in many ๏ฌnite domains from the literature. |
| Researcher Affiliation | Academia | 1Univ Lyon, INSA Lyon, INRIA, CITI, F-69621 Villeurbanne, France 2INRIA / Universit e de Lorraine, Nancy, France. |
| Pseudocode | Yes | Algorithm 1 The o SARSA Algorithm |
| Open Source Code | No | The paper does not provide explicit access (link or statement of availability) to the source code for the methodology it describes. |
| Open Datasets | Yes | We evaluate our algorithm on multiple 2-agent benchmarks from the literature all available at masplan.org: Mabc, Recycling, Gridsmall, Grid3x3corners, Boxpushing, and Tiger. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | Yes | We ran the o SARSA algorithm on a Mac OSX machine with 3.8GHz Core i5 and 8GB of available RAM. |
| Software Dependencies | No | We solved the MILPs using ILOG CPLEX Optimization Studio. However, a specific version number for CPLEX is not provided. |
| Experiment Setup | Yes | For REINFORCE and o SARSA, we used hyper-parameters ฯต and ฮฒ ranging from 1 to 10 3 with a decaying factor of 104, sample size |D| 104. We use maximum episodes and time limit 105 and 5 hours, respectively, as our stopping criteria. |