Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Authors: Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Evaluation. We first evaluate single-task performance of the introduced Dec-HDRQN approach on a series of increasingly challenging domains. ... Performance is evaluated on both multi-agent single-target (MAST) and multi-agent multi-target (MAMT) capture domains... |
| Researcher Affiliation | Collaboration | 1Laboratory for Information and Decision Systems (LIDS), MIT, Cambridge, MA, USA 2College of Computer and Information Science (CCIS), Northeastern University, Boston, MA, USA 3Boeing Research & Technology, Seattle, WA, USA. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code availability. |
| Open Datasets | Yes | Performance is evaluated on both multi-agent single-target (MAST) and multi-agent multi-target (MAMT) capture domains, variations of the existing meeting-in-a-grid Dec POMDP benchmark (Amato et al., 2009). |
| Dataset Splits | No | The paper evaluates performance based on 'randomly-initialized episodes' and 'training epochs' rather than explicit train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as GPU/CPU models or processor types. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'DRQNs' but does not provide specific version numbers for any libraries or frameworks used, such as 'PyTorch 1.9' or 'TensorFlow 2.0'. |
| Experiment Setup | Yes | All experiments use DRQNs with 2 multi-layer perceptron (MLP) layers, an LSTM layer (Hochreiter & Schmidhuber, 1997) with 64 memory cells, and another 2 MLP layers. MLPs have 32 hidden units each and rectified linear unit nonlinearities are used throughout, with the exception of the final (linear) layer. Experiments use γ = 0.95 and Adam optimizer (Kingma & Ba, 2014) with base learning rate 0.001. Dec-HDRQNs use hysteretic learning rate β = 0.2 to 0.4. |