Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Authors: Zixian Ma, Rose Wang, Fei-Fei Li, Michael Bernstein, Ranjay Krishna

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach across 6 tasks in the multi-agent particle and the complex Google Research football environments, comparing ELIGN to sparse and curiosity-based intrinsic rewards.
Researcher Affiliation Academia Stanford University1, University of Washington2
Pseudocode Yes Algorithm 1 ELIGN: Expectation Alignment
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the Appendix
Open Datasets Yes We evaluate ELIGN across both cooperative and competitive tasks in the multi-agent particle environment (Mordatch & Abbeel, 2017; Lowe et al., 2017) and the Google Research football environment (Kurach et al., 2019).
Dataset Splits No The paper mentions training until 'the best evaluation episode reward hasn't changed for 100 epochs,' implying some form of evaluation during training, but it does not explicitly define a separate 'validation' dataset split with percentages or counts, distinct from the test set.
Hardware Specification Yes For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries. It mentions 'We primarily use the multi-agent decentralized variant of the soft-actor critic algorithm' but without version details.
Experiment Setup Yes All the hyperparameters used in the training can be found in the Appendix. For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs. Each epoch equates to 4K episodes of 25 timesteps. We train all algorithms with 5 random seeds.