Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Equivariant Networks for Zero-Shot Coordination
Authors: Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. |
| Researcher Affiliation | Collaboration | Darius Muglich University of Oxford EMAIL Christian Schroeder de Witt FLAIR, University of Oxford EMAIL Elise van der Pol Microsoft Research AI4Science EMAIL Shimon Whiteson University of Oxford EMAIL Jakob Foerster FLAIR, University of Oxford EMAIL |
| Pseudocode | No | The paper describes an algorithmic approach in paragraph form but does not provide a formal pseudocode block or algorithm figure. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experi-mental results (either in the supplemental material or as a URL)? [Yes] Section 4, the Appendix, and the supplementary material |
| Open Datasets | Yes | The primary test bed for our methodology is the AI benchmark task Hanabi [5]. |
| Dataset Splits | No | The paper describes training and evaluation on the Hanabi task (a reinforcement learning environment) and uses terms like 'self-play scores' and 'evaluated over 5000 games'. However, it does not specify explicit train/validation/test dataset splits with percentages or sample counts, which is more typical for supervised learning contexts. |
| Hardware Specification | No | The acknowledgements state: 'The experiments were made possible by a generous equipment grant from NVIDIA.' and the ethics statement confirms: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See the Appendix'. While NVIDIA is mentioned, specific GPU models or detailed hardware specifications are not provided in the main text. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks like 'R2D2' [25], 'VDN' [43], 'SAD' [21], 'Adam' optimizer [26], and 'LSTM' [20]. However, it does not provide specific version numbers for any of these software components, libraries, or programming languages. |
| Experiment Setup | No | For further details of the training setup and hyperparameters used in the following experiments, please refer to the Appendix. |