reproducibilityindex.ai

Learning Expensive Coordination: An Event-Based Deep RL Approach

Authors: Zhenyu Shi*, Runsheng Yu*, Xinrun Wang*, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically. 5 EXPERIMENTAL RESULTS
Researcher Affiliation	Academia	School of Computer Science and Engineering, Nanyang Technological University, Singapore runshengyu@gmail.com, {xwang033,rundong001,yzhang137}@e.ntu.edu.sg, boan@ntu.edu.sg Zhen Yu Shi & Hanjiang Lai School of Data and Computer Science, Sun Yat-sen University Guangzhou, China shizhy6@mail2.sysu.edu.cn, laihanj3@mail.sysu.edu.cn
Pseudocode	Yes	The pseudo-code can be found in Appendix C. Algorithm 1: EBPG Algorithm 2: Action Choices for Leader
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets	No	The paper mentions tasks like 'Resource Collections', 'Modiﬁed Navigation', and 'Modiﬁed Predator-Prey' based on prior work but does not provide specific links, DOIs, repositories, or formal citations with authors and years for public access to the modified datasets or environments used in their experiments.
Dataset Splits	No	The paper mentions 'The total training episode is 250, 000 for all the tasks' but does not specify exact training, validation, and test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	Yes	Our method takes less than two days to train on a NVIDIA Geforce GTX 1080Ti GPU in each experiment.
Software Dependencies	No	Our code is implemented in Pytorch (Paszke et al., 2017). The optimization algorithm is Adam (Kingma & Ba, 2014). (No specific version numbers for PyTorch or Adam are provided)
Experiment Setup	Yes	If no special mention, the batch size is 1 (online learning). Similar to (Shu & Tian, 2019), we set the learning rate as 0.001 for the leader s critic and followers while 0.0003 for the leader s policy. The optimization algorithm is Adam (Kingma & Ba, 2014). For the loss function, we set the λ1 = 0.01 and λ2 = 0.001. The total training episode is 250, 000 for all the tasks (including both the rule-based followers and the RL-based followers). To encourage exploration, we use the ι-greedy2. For the leader, the exploration rate is set to 0.1 and slightly decreases to zero (5000 episode). For the followers, the exploration rate for each agent is always 0.3 (except for the noise experiments).