Towards a Better Understanding of Learning with Multiagent Teams

Authors: David Radke, Kate Larson, Tim Brecht, Kyle Tilbury

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our conclusions with both theoretical analysis and empirical results.
Researcher Affiliation Academia David R. Cheriton School of Computer Science, University of Waterloo
Pseudocode No The paper describes algorithms and methods but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using third-party libraries and environments like 'Rllib RL library' and gives links to environment implementations, but it does not provide an explicit statement or link for the authors' own source code for the methodology described in the paper.
Open Datasets Yes Cleanup [Vinitsky et al., 2019] is a temporally and spatially extended Markov game representing a sequential social dilemma. Neural MMO (NMMO) [Suarez et al., 2019] is a large, customizable, and partially observable multiagent environment that supports foraging and exploration.
Dataset Splits No The paper describes the experimental duration and number of trials (e.g., 'for 50 trials of 1,000 episodes (100 steps each)', 'for 10 trials of 1.6 10^8 environmental steps (1,000 timesteps per-episode)'), but it does not provide specific details on train/validation/test dataset splits as it concerns reinforcement learning environments where data is generated dynamically during training rather than static datasets.
Hardware Specification No The paper states: 'We thank the Vector Institute for providing the compute resources necessary for this research to be conducted.' However, it does not specify any exact hardware details such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions 'Rllib RL library' and algorithms like 'Proximal Policy Optimization (PPO)' and 'Tabular Q-Learning'. However, it does not provide specific version numbers for any of these software components (e.g., Rllib version, Python version, PyTorch/TensorFlow version).
Experiment Setup Yes Agents use Tabular Q-Learning [Sutton and Barto, 2018] with γ = 0.9 and ϵ-exploration (ϵ = 0.3) for 50 trials of 1,000 episodes (100 steps each). We implement Proximal Policy Optimization (PPO) [Schulman et al., 2017] agents for 10 trials of 1.6 10^8 environmental steps (1,000 timesteps per-episode) using the Rllib RL library. We implement PPO agents for eight trials of 1.5 10^7 environmental timesteps (1,000 per-episode) using Rllib.