Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. ... We evaluate the SUPER approach on a number of multiagent benchmark domains. ... Figure 1: Performance of SUPER-dueling-DDQN variants with target bandwidth 0.1 on all domains.
Researcher Affiliation Academia Matthias Gerstgrasser School of Engineering and Applied Sciences Harvard University Computer Science Department Stanford University matthias@seas.harvard.edu Tom Danino Sarah Keren The Taub Faculty of Computer Science Technion Israel Institute of Technology tom.danino@campus.technion.ac.il sarahk@cs.technion.ac.il
Pseudocode Yes Algorithm 1 SUPER algorithm for DQN for each training iteration do Collect a batch of experiences b {DQN} for each agent i do Insert bi into bufferi {DQN} end for for each agent i do Select b i bi of experiences to share1 {SUPER} for each agent j = i do Insert b i into bufferj {SUPER} end for end for for each agent i do Sample a train batch bi from bufferi {DQN} Learn on train batch bi {DQN} end for end for 1 See section Experience Selection
Open Source Code No All source code is included in the appendix and will be made available on publication under an open-source license. We refer the reader to the included README file, which contains instructions to recreate the experiments discussed in this paper.
Open Datasets Yes We therefore run our experiments on several domains that are part of well-established benchmark packages. These include three domains from the Petting Zoo package [38], three domains from the Melting Pot package [18], and a two-player variant of the Atari 2600 game Space Invaders.
Dataset Splits No No explicit mention of validation dataset splits was found. The paper discusses train and test datasets, and hyperparameter tuning, but not a separate validation split.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were explicitly mentioned for running the experiments.
Software Dependencies Yes We performed all experiments using the open-source library RLlib [20]. Experiments in Figure 1 and 6 were ran using RLlib version 2.0.0; experiments in other figures were run using version 1.13.0.
Experiment Setup Yes Table 2: Hyperparameter Configuration Table SISL: Pursuit Environment Parameters ... Table 3: Hyperparameter Configuration Table MAgent: Battle Environment Parameters ... Table 4: Hyperparameter Configuration Table MAgent: Adversarial Pursuit Environment Parameters ... Table 5: Hyperparameter Configuration Table Melting Pot Policy Network ... Table 6: Hyperparameter Configuration Table Atari Policy Network