Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

Authors: Jiayu Chen, Yuanxin Zhang, Yuanfan Xu, Huimin Ma, Huazhong Yang, Jiaming Song, Yu Wang, Yi Wu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in Open AI s hide-and-seek project.
Researcher Affiliation Academia 1 Tsinghua University, 2 Shanghai Qi Zhi Institute, 3 University of Science and Technology Beijing, 4 Stanford University
Pseudocode Yes Algorithm 1: The VACL Algorithm
Open Source Code Yes Our project website is at https://sites.google.com/view/vacl-neurips-2021. (This project website links to a GitHub repository: https://github.com/PKU-RL/VACL)
Open Datasets Yes We consider four tasks over two environments, Simple-Spread and Push-Ball in the multi-agent particle-world environment (MPE) [19], and Ramp-Use and Lock-and-Return in the Mu Jo Co-based hide-and-seek environment (Hn S) [2].
Dataset Splits No The paper references datasets/environments (MPE, Hn S) but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproduction.
Hardware Specification Yes Every experiment is repeated over 3 seeds and performed on a desktop machine with one 64-core CPU and one 2080-Ti GPU, which is used for forward action computation and training updates.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) needed to replicate the experiment.
Experiment Setup Yes Every experiment is repeated over 3 seeds and performed on a desktop machine with one 64-core CPU and one 2080-Ti GPU, which is used for forward action computation and training updates. For PC-Unif and VACL, we start with n0 = 4 in Simple-Spread and n0 = 2 in Push-Ball and then switch to the desired agent number.