Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. ... We evaluate the SUPER approach on a number of multiagent benchmark domains. ... Figure 1: Performance of SUPER-dueling-DDQN variants with target bandwidth 0.1 on all domains.
Researcher Affiliation Academia Matthias Gerstgrasser School of Engineering and Applied Sciences Harvard University Computer Science Department Stanford University EMAIL Tom Danino Sarah Keren The Taub Faculty of Computer Science Technion Israel Institute of Technology EMAIL EMAIL
Pseudocode Yes Algorithm 1 SUPER algorithm for DQN for each training iteration do Collect a batch of experiences b {DQN} for each agent i do Insert bi into bufferi {DQN} end for for each agent i do Select b i bi of experiences to share1 {SUPER} for each agent j = i do Insert b i into bufferj {SUPER} end for end for for each agent i do Sample a train batch bi from bufferi {DQN} Learn on train batch bi {DQN} end for end for 1 See section Experience Selection
Open Source Code No All source code is included in the appendix and will be made available on publication under an open-source license. We refer the reader to the included README file, which contains instructions to recreate the experiments discussed in this paper.
Open Datasets Yes We therefore run our experiments on several domains that are part of well-established benchmark packages. These include three domains from the Petting Zoo package [38], three domains from the Melting Pot package [18], and a two-player variant of the Atari 2600 game Space Invaders.
Dataset Splits No No explicit mention of validation dataset splits was found. The paper discusses train and test datasets, and hyperparameter tuning, but not a separate validation split.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were explicitly mentioned for running the experiments.
Software Dependencies Yes We performed all experiments using the open-source library RLlib [20]. Experiments in Figure 1 and 6 were ran using RLlib version 2.0.0; experiments in other figures were run using version 1.13.0.
Experiment Setup Yes Table 2: Hyperparameter Configuration Table SISL: Pursuit Environment Parameters ... Table 3: Hyperparameter Configuration Table MAgent: Battle Environment Parameters ... Table 4: Hyperparameter Configuration Table MAgent: Adversarial Pursuit Environment Parameters ... Table 5: Hyperparameter Configuration Table Melting Pot Policy Network ... Table 6: Hyperparameter Configuration Table Atari Policy Network