reproducibilityindex.ai

M^3RL: Mind-aware Multi-agent Management Reinforcement Learning

Authors: Tianmin Shu, Yuandong Tian

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have evaluated our approach in two environments, Resource Collection and Crafting, to simulate multi-agent management problems with various task settings and multiple designs for the worker agents. The experimental results have validated the effectiveness of our approach in modeling worker agents minds online, and in achieving optimal ad-hoc teaming with good generalization and fast adaptation.
Researcher Affiliation	Collaboration	Tianmin Shu University of California, Los Angeles tianmin.shu@ucla.edu Yuandong Tian Facebook AI Research yuandong@fb.com
Pseudocode	Yes	We summarize the rollout algorithm and the learning algorithm in Algorithm 1 and Algorithm 2 respectively.
Open Source Code	Yes	1Code is available at https://github.com/facebookresearch/M3RL.
Open Datasets	No	The paper describes custom-built environments ('Resource Collection' and 'Crafting') and the design of worker agents, but does not provide concrete access information (link, DOI, formal citation) to a publicly available dataset for these environments.
Dataset Splits	No	The paper describes maintaining and sampling from a population of worker agents during training and testing in simulated environments, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test sets) as would be typical for a fixed dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper mentions that 'All modules are trained with RMSProp', but it does not provide specific version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup	Yes	All modules are trained with RMSProp (Tieleman & Hinto, 2012) using a learning rate of 0.0004. We also adopt an agent-wise ϵ-greedy exploration, where a worker has as a chance of ϵ to be assigned with a random goal at the beginning of an episode. In Crafting, we set a fixed number of episodes at the beginning of training to be the warm-up phase.