Learning Expensive Coordination: An Event-Based Deep RL Approach
Authors: Zhenyu Shi*, Runsheng Yu*, Xinrun Wang*, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in resource collections, navigation, and the predator-prey game reveal that our approach outperforms the state-of-the-art methods dramatically. 5 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Academia | School of Computer Science and Engineering, Nanyang Technological University, Singapore runshengyu@gmail.com, {xwang033,rundong001,yzhang137}@e.ntu.edu.sg, boan@ntu.edu.sg Zhen Yu Shi & Hanjiang Lai School of Data and Computer Science, Sun Yat-sen University Guangzhou, China shizhy6@mail2.sysu.edu.cn, laihanj3@mail.sysu.edu.cn |
| Pseudocode | Yes | The pseudo-code can be found in Appendix C. Algorithm 1: EBPG Algorithm 2: Action Choices for Leader |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | No | The paper mentions tasks like 'Resource Collections', 'Modified Navigation', and 'Modified Predator-Prey' based on prior work but does not provide specific links, DOIs, repositories, or formal citations with authors and years for public access to the *modified* datasets or environments used in their experiments. |
| Dataset Splits | No | The paper mentions 'The total training episode is 250, 000 for all the tasks' but does not specify exact training, validation, and test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | Our method takes less than two days to train on a NVIDIA Geforce GTX 1080Ti GPU in each experiment. |
| Software Dependencies | No | Our code is implemented in Pytorch (Paszke et al., 2017). The optimization algorithm is Adam (Kingma & Ba, 2014). (No specific version numbers for PyTorch or Adam are provided) |
| Experiment Setup | Yes | If no special mention, the batch size is 1 (online learning). Similar to (Shu & Tian, 2019), we set the learning rate as 0.001 for the leader s critic and followers while 0.0003 for the leader s policy. The optimization algorithm is Adam (Kingma & Ba, 2014). For the loss function, we set the λ1 = 0.01 and λ2 = 0.001. The total training episode is 250, 000 for all the tasks (including both the rule-based followers and the RL-based followers). To encourage exploration, we use the ι-greedy2. For the leader, the exploration rate is set to 0.1 and slightly decreases to zero (5000 episode). For the followers, the exploration rate for each agent is always 0.3 (except for the noise experiments). |