M^3RL: Mind-aware Multi-agent Management Reinforcement Learning
Authors: Tianmin Shu, Yuandong Tian
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have evaluated our approach in two environments, Resource Collection and Crafting, to simulate multi-agent management problems with various task settings and multiple designs for the worker agents. The experimental results have validated the effectiveness of our approach in modeling worker agents minds online, and in achieving optimal ad-hoc teaming with good generalization and fast adaptation. |
| Researcher Affiliation | Collaboration | Tianmin Shu University of California, Los Angeles tianmin.shu@ucla.edu Yuandong Tian Facebook AI Research yuandong@fb.com |
| Pseudocode | Yes | We summarize the rollout algorithm and the learning algorithm in Algorithm 1 and Algorithm 2 respectively. |
| Open Source Code | Yes | 1Code is available at https://github.com/facebookresearch/M3RL. |
| Open Datasets | No | The paper describes custom-built environments ('Resource Collection' and 'Crafting') and the design of worker agents, but does not provide concrete access information (link, DOI, formal citation) to a publicly available dataset for these environments. |
| Dataset Splits | No | The paper describes maintaining and sampling from a population of worker agents during training and testing in simulated environments, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test sets) as would be typical for a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions that 'All modules are trained with RMSProp', but it does not provide specific version numbers for any software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | All modules are trained with RMSProp (Tieleman & Hinto, 2012) using a learning rate of 0.0004. We also adopt an agent-wise ϵ-greedy exploration, where a worker has as a chance of ϵ to be assigned with a random goal at the beginning of an episode. In Crafting, we set a fixed number of episodes at the beginning of training to be the warm-up phase. |