Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Authors: Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P. How, John Vian
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Evaluation. We first evaluate single-task performance of the introduced Dec-HDRQN approach on a series of increasingly challenging domains. ... Performance is evaluated on both multi-agent single-target (MAST) and multi-agent multi-target (MAMT) capture domains... |
| Researcher Affiliation | Collaboration | 1Laboratory for Information and Decision Systems (LIDS), MIT, Cambridge, MA, USA 2College of Computer and Information Science (CCIS), Northeastern University, Boston, MA, USA 3Boeing Research & Technology, Seattle, WA, USA. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code availability. |
| Open Datasets | Yes | Performance is evaluated on both multi-agent single-target (MAST) and multi-agent multi-target (MAMT) capture domains, variations of the existing meeting-in-a-grid Dec POMDP benchmark (Amato et al., 2009). |
| Dataset Splits | No | The paper evaluates performance based on 'randomly-initialized episodes' and 'training epochs' rather than explicit train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as GPU/CPU models or processor types. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'DRQNs' but does not provide specific version numbers for any libraries or frameworks used, such as 'PyTorch 1.9' or 'TensorFlow 2.0'. |
| Experiment Setup | Yes | All experiments use DRQNs with 2 multi-layer perceptron (MLP) layers, an LSTM layer (Hochreiter & Schmidhuber, 1997) with 64 memory cells, and another 2 MLP layers. MLPs have 32 hidden units each and rectified linear unit nonlinearities are used throughout, with the exception of the final (linear) layer. Experiments use γ = 0.95 and Adam optimizer (Kingma & Ba, 2014) with base learning rate 0.001. Dec-HDRQNs use hysteretic learning rate β = 0.2 to 0.4. |