Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
Authors: Jiayu Chen, Yuanxin Zhang, Yuanfan Xu, Huimin Ma, Huazhong Yang, Jiaming Song, Yu Wang, Yi Wu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in Open AI s hide-and-seek project. |
| Researcher Affiliation | Academia | 1 Tsinghua University, 2 Shanghai Qi Zhi Institute, 3 University of Science and Technology Beijing, 4 Stanford University |
| Pseudocode | Yes | Algorithm 1: The VACL Algorithm |
| Open Source Code | Yes | Our project website is at https://sites.google.com/view/vacl-neurips-2021. (This project website links to a GitHub repository: https://github.com/PKU-RL/VACL) |
| Open Datasets | Yes | We consider four tasks over two environments, Simple-Spread and Push-Ball in the multi-agent particle-world environment (MPE) [19], and Ramp-Use and Lock-and-Return in the Mu Jo Co-based hide-and-seek environment (Hn S) [2]. |
| Dataset Splits | No | The paper references datasets/environments (MPE, Hn S) but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproduction. |
| Hardware Specification | Yes | Every experiment is repeated over 3 seeds and performed on a desktop machine with one 64-core CPU and one 2080-Ti GPU, which is used for forward action computation and training updates. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment. |
| Experiment Setup | Yes | Every experiment is repeated over 3 seeds and performed on a desktop machine with one 64-core CPU and one 2080-Ti GPU, which is used for forward action computation and training updates. For PC-Unif and VACL, we start with n0 = 4 in Simple-Spread and n0 = 2 in Push-Ball and then switch to the desired agent number. |