ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
Authors: Zixian Ma, Rose Wang, Fei-Fei Li, Michael Bernstein, Ranjay Krishna
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach across 6 tasks in the multi-agent particle and the complex Google Research football environments, comparing ELIGN to sparse and curiosity-based intrinsic rewards. |
| Researcher Affiliation | Academia | Stanford University1, University of Washington2 |
| Pseudocode | Yes | Algorithm 1 ELIGN: Expectation Alignment |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the Appendix |
| Open Datasets | Yes | We evaluate ELIGN across both cooperative and competitive tasks in the multi-agent particle environment (Mordatch & Abbeel, 2017; Lowe et al., 2017) and the Google Research football environment (Kurach et al., 2019). |
| Dataset Splits | No | The paper mentions training until 'the best evaluation episode reward hasn't changed for 100 epochs,' implying some form of evaluation during training, but it does not explicitly define a separate 'validation' dataset split with percentages or counts, distinct from the test set. |
| Hardware Specification | Yes | For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries. It mentions 'We primarily use the multi-agent decentralized variant of the soft-actor critic algorithm' but without version details. |
| Experiment Setup | Yes | All the hyperparameters used in the training can be found in the Appendix. For the Multi-agent particle environment, each experiment uses one Tesla K40 GPU to train until convergence, i.e. the best evaluation episode reward hasn t changed for 100 epochs. Each epoch equates to 4K episodes of 25 timesteps. We train all algorithms with 5 random seeds. |