reproducibilityindex.ai

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Authors: Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using variety of tasks including Star Craft Brood Wars TM explore and combat scenarios, we show that our network yields improved performance and convergence rates than the baselines as the scale increases. Our results convey that IC3Net agents learn when to communicate based on the scenario and proﬁtability. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Amanpreet Singh New York University Facebook AI Research amanpreet@nyu.edu Tushar Jain New York University tushar@nyu.edu Sainbayar Sukhbaatar New York University Facebook AI Research sainbar@cs.nyu.edu
Pseudocode	No	The paper describes the model (IC3Net) using text and mathematical equations in Section 3 but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	1The code is available at https://github.com/IC3Net/IC3Net.
Open Datasets	Yes	We consider three environments for our analysis and experiments. (i) a predator-prey environment (PP)... (ii) a trafﬁc junction environment (TJ) similar to Sukhbaatar et al. (2016)... (iii) Star Craft Brood Wars (SC) explore and combat tasks... We implement our model using Py Torch and environments using Gym (Brockman et al., 2016).
Dataset Splits	No	The paper specifies training epochs ('1000 epochs', '2000 epochs') and discusses the 'final results' of experiments, but it does not explicitly mention or detail a specific dataset split for validation (e.g., as a percentage or sample count).
Hardware Specification	No	The paper mentions that 'The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more,' but it does not specify any particular CPU models, GPU models, or other detailed hardware specifications used for the experiments.
Software Dependencies	No	We implement our model using Py Torch and environments using Gym (Brockman et al., 2016). We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter.
Experiment Setup	Yes	We set the hidden layer size to 128 units and we use LSTM (Hochreiter & Schmidhuber, 1997) with recurrence for all of the baselines and IC3Net. We use RMSProp (Tieleman & Hinton, 2012) with initial learning rate as a tuned hyper-parameter. All of the models use skip-connections (He et al., 2016). The training is distributed over 16 cores and each core runs a mini-batch till total episodes steps are 500 or more. We do 10 weight updates per epoch. We run predator-prey, Star Craft experiments for 1000 epochs, trafﬁc junction experiment for 2000 epochs and report the ﬁnal results. We utilized curriculum learning Bengio et al. (2009) to make the training process easier... The learning rate is ﬁxed at 0.003 throughout.