Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Authors: Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using variety of tasks including Star Craft Brood Wars TM explore and combat scenarios, we show that our network yields improved performance and convergence rates than the baselines as the scale increases. Our results convey that IC3Net agents learn when to communicate based on the scenario and profitability. 4 EXPERIMENTS
Researcher Affiliation Collaboration Amanpreet Singh New York University Facebook AI Research amanpreet@nyu.edu Tushar Jain New York University tushar@nyu.edu Sainbayar Sukhbaatar New York University Facebook AI Research sainbar@cs.nyu.edu
Pseudocode No The paper describes the model (IC3Net) using text and mathematical equations in Section 3 but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes 1The code is available at https://github.com/IC3Net/IC3Net.
Open Datasets Yes We consider three environments for our analysis and experiments. (i) a predator-prey environment (PP)... (ii) a traffic junction environment (TJ) similar to Sukhbaatar et al. (2016)... (iii) Star Craft Brood Wars (SC) explore and combat tasks... We implement our model using Py Torch and environments using Gym (Brockman et al., 2016).
Dataset Splits No The paper specifies training epochs ('1000 epochs', '2000 epochs') and discusses the 'final results' of experiments, but it does not explicitly mention or detail a specific dataset split for validation (e.g., as a percentage or sample count).
Hardware Specification No The paper mentions that 'The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more,' but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies No We implement our model using Py Torch and environments using Gym (Brockman et al., 2016). We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter.
Experiment Setup Yes We set the hidden layer size to 128 units and we use LSTM (Hochreiter & Schmidhuber, 1997) with recurrence for all of the baselines and IC3Net. We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter. All of the models use skip-connections (He et al., 2016). The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more. We do 10 weight updates per epoch. We run predator-prey, Star Craft experiments for 1000 epochs, traffic junction experiment for 2000 epochs and report the final results. We utilized curriculum learning Bengio et al. (2009) to make the training process easier... The learning rate is fixed at 0.003 throughout.