reproducibilityindex.ai

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Authors: Zixian Ma, Rose Wang, Fei-Fei Li, Michael Bernstein, Ranjay Krishna

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficacy of our approach across 6 tasks in the multi-agent particle and the complex Google Research football environments, comparing ELIGN to sparse and curiosity-based intrinsic rewards.
Researcher Affiliation	Academia	Stanford University1, University of Washington2
Pseudocode	Yes	Algorithm 1 ELIGN: Expectation Alignment
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the Appendix
Open Datasets	Yes	We evaluate ELIGN across both cooperative and competitive tasks in the multi-agent particle environment (Mordatch & Abbeel, 2017; Lowe et al., 2017) and the Google Research football environment (Kurach et al., 2019).
Dataset Splits	No	The paper mentions training until 'the best evaluation episode reward hasn't changed for 100 epochs,' implying some form of evaluation during training, but it does not explicitly define a separate 'validation' dataset split with percentages or counts, distinct from the test set.
Hardware Specification	Yes	For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries. It mentions 'We primarily use the multi-agent decentralized variant of the soft-actor critic algorithm' but without version details.
Experiment Setup	Yes	All the hyperparameters used in the training can be found in the Appendix. For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs. Each epoch equates to 4K episodes of 25 timesteps. We train all algorithms with 5 random seeds.