VAST: Value Function Factorization with Variable Agent Sub-Teams

Authors: Thomy Phan, Fabian Ritz, Lenz Belzner, Philipp Altmann, Thomas Gabor, Claudia Linnhoff-Popien

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate VAST in three multi-agent domains and show that VAST can significantly outperform state-of-the-art VFF, when the number of agents is sufficiently large. [...] 5 Experimental Setup [...] 6.1 Comparison of Value Function Factorization Operators for VAST [...] 6.2 State-of-the-Art Comparison
Researcher Affiliation Academia Thomy Phan1 Fabian Ritz1 Lenz Belzner2 Philipp Altmann1 Thomas Gabor1 Claudia Linnhoff-Popien1 1LMU Munich 2Technische Hochschule Ingolstadt
Pseudocode Yes Algorithm 1 Variable Agent Sub-Teams
Open Source Code Yes Code and README are available at https://github.com/thomyphan/scalable-marl.
Open Datasets No Does not apply. Only simulated data from the domains described in Section 5 was used. The paper describes custom-built grid-world environments (Warehouse[N], Battle[N], Gaussian Squeeze[N]) but does not provide concrete access information for a publicly available dataset used for training.
Dataset Splits No Appendix A.1.1 General Training Details: 'We applied early stopping to prevent overfitting and selected hyperparameters based on the validation performance.' While validation is mentioned for hyperparameter selection and early stopping, the paper does not specify concrete dataset splits (e.g., percentages or sample counts) for training, validation, and test sets. The experiments are conducted in simulated environments rather than on traditional datasets with predefined splits.
Hardware Specification Yes All experiments ran on compute servers equipped with Intel Xeon E5-2630 v4 (10 cores), NVIDIA Quadro RTX 5000 (16 GB), and 256 GB RAM.
Software Dependencies Yes We used Python 3.8.5, PyTorch 1.8.0, and CUDA 11.1 for all experiments.
Experiment Setup Yes Further details on the training setup and the experiments are specified in Appendix A.1 and A.2. Appendix A.1.1 General Training Details: 'All parameters for the networks are initialized uniformly randomly within [−0.01, 0.01]. We used Adam optimizer [17] with a learning rate of 5e-4 and epsilon 1e-5. [...] We used a batch size of 32.' Appendix A.1.2 Domain-Specific Training Details: 'Warehouse[N]: number of episodes 30000, Battle[N]: number of episodes 100000, Gaussian Squeeze[N]: number of episodes 50000.'