reproducibilityindex.ai

Learning to Schedule Communication in Multi-agent Reinforcement Learning

Authors: Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son, Yung Yi

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Sched Net against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between Sched Net and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.
Researcher Affiliation	Academia	Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son & Yung Yi School of Electrical Engineering, KAIST Daejeon, South Korea
Pseudocode	Yes	Algorithm 1 Sched Net
Open Source Code	Yes	The code is available on https://github.com/rhoowd/sched_net
Open Datasets	Yes	Environments To evaluate Sched Net, we consider two different environments for demonstrative purposes: Predator and Prey (PP) which is used in Stone & Veloso (2000), and Cooperative Communication and Navigation (CCN) which is the simpliﬁed version of the one in Lowe et al. (2017).
Dataset Splits	No	The paper describes simulated environments and training steps but does not provide explicit training/validation/test dataset splits as it applies to a static dataset.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory details) used to run the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' but does not specify version numbers for any software components, libraries, or frameworks used.
Experiment Setup	Yes	Table 1 shows the values of the hyperparameters for the CCN and the PP task. Hyperparameter Value Description training step 750000 Maximum time steps until the end of training episode length 1000 Maximum time steps per episode discount factor 0.9 Importance of future rewards learning rate for actor 0.00001 Actor network learning rate used by Adam optimizer learning rate for critic 0.0001 Critic network learning rate used by Adam optimizer target update rate 0.05 Target network update rate to track learned network entropy regularization weight 0.01 Weight of regularization to encourage exploration