ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Authors: Zixian Ma, Rose Wang, Fei-Fei Li, Michael Bernstein, Ranjay Krishna

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach across 6 tasks in the multi-agent particle and the complex Google Research football environments, comparing ELIGN to sparse and curiosity-based intrinsic rewards.
Researcher Affiliation Academia Stanford University1, University of Washington2
Pseudocode Yes Algorithm 1 ELIGN: Expectation Alignment
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the Appendix
Open Datasets Yes We evaluate ELIGN across both cooperative and competitive tasks in the multi-agent particle environment (Mordatch & Abbeel, 2017; Lowe et al., 2017) and the Google Research football environment (Kurach et al., 2019).
Dataset Splits No The paper mentions training until 'the best evaluation episode reward hasn't changed for 100 epochs,' implying some form of evaluation during training, but it does not explicitly define a separate 'validation' dataset split with percentages or counts, distinct from the test set.
Hardware Specification Yes For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries. It mentions 'We primarily use the multi-agent decentralized variant of the soft-actor critic algorithm' but without version details.
Experiment Setup Yes All the hyperparameters used in the training can be found in the Appendix. For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs. Each epoch equates to 4K episodes of 25 timesteps. We train all algorithms with 5 random seeds.