Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning
Authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. ... We evaluate the SUPER approach on a number of multiagent benchmark domains. ... Figure 1: Performance of SUPER-dueling-DDQN variants with target bandwidth 0.1 on all domains. |
| Researcher Affiliation | Academia | Matthias Gerstgrasser School of Engineering and Applied Sciences Harvard University Computer Science Department Stanford University matthias@seas.harvard.edu Tom Danino Sarah Keren The Taub Faculty of Computer Science Technion Israel Institute of Technology tom.danino@campus.technion.ac.il sarahk@cs.technion.ac.il |
| Pseudocode | Yes | Algorithm 1 SUPER algorithm for DQN for each training iteration do Collect a batch of experiences b {DQN} for each agent i do Insert bi into bufferi {DQN} end for for each agent i do Select b i bi of experiences to share1 {SUPER} for each agent j = i do Insert b i into bufferj {SUPER} end for end for for each agent i do Sample a train batch bi from bufferi {DQN} Learn on train batch bi {DQN} end for end for 1 See section Experience Selection |
| Open Source Code | No | All source code is included in the appendix and will be made available on publication under an open-source license. We refer the reader to the included README file, which contains instructions to recreate the experiments discussed in this paper. |
| Open Datasets | Yes | We therefore run our experiments on several domains that are part of well-established benchmark packages. These include three domains from the Petting Zoo package [38], three domains from the Melting Pot package [18], and a two-player variant of the Atari 2600 game Space Invaders. |
| Dataset Splits | No | No explicit mention of validation dataset splits was found. The paper discusses train and test datasets, and hyperparameter tuning, but not a separate validation split. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) were explicitly mentioned for running the experiments. |
| Software Dependencies | Yes | We performed all experiments using the open-source library RLlib [20]. Experiments in Figure 1 and 6 were ran using RLlib version 2.0.0; experiments in other figures were run using version 1.13.0. |
| Experiment Setup | Yes | Table 2: Hyperparameter Configuration Table SISL: Pursuit Environment Parameters ... Table 3: Hyperparameter Configuration Table MAgent: Battle Environment Parameters ... Table 4: Hyperparameter Configuration Table MAgent: Adversarial Pursuit Environment Parameters ... Table 5: Hyperparameter Configuration Table Melting Pot Policy Network ... Table 6: Hyperparameter Configuration Table Atari Policy Network |