reproducibilityindex.ai

Self-Explaining Deviations for Coordination

Authors: Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob Foerster

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we evaluate IMPROVISED both in an illustrative toy setting and the popular benchmark setting Hanabi, where we show that it can produce so called ﬁnesse plays. We test the IMPROVISED in two different settings. The ﬁrst setting is the trampoline-tiger game explained before. Secondly, we apply IMPROVISED to three-player Hanabi, where we start from a blueprint trained on human data.
Researcher Affiliation	Collaboration	Hengyuan Hu Stanford University hengyuan@cs.stanford.edu Samuel Sokota Carnegie Mellon University ssokota@andrew.cmu.edu Meta AI dwu@meta.com Anton Bakhtin Meta AI yolo@meta.com Andrei Lupu Meta AI & FLAIR, University of Oxford alupu@meta.com Brandon Cui Mosaic ML brandon@mosaicml.com Jakob N. Foerster FLAIR, University of Oxford jakob.foerster@eng.ox.ac.uk
Pseudocode	Yes	Please refer to the Appendix A for the detailed pseudocode.
Open Source Code	Yes	We provide the code for our Hanabi experiments at https://github.com/facebookresearch/off-belief-learning/blob/main/ pyhanabi/finesse.py.
Open Datasets	Yes	Lastly, we present experiments on the large scale benchmark Hanabi [1], where we show that IMPROVISED is able to produce ﬁnesse plays, which is one of the most interesting techniques that human experts perform frequently. To implement IMPROVISED in Hanabi, we ﬁrst need a belief function from which we can sample game states given either public or private knowledge of the game to perform Monte Carlo rollouts. Luckily, the belief over possible hands in Hanabi can be computed analytically [8]. We use a blueprint policy to generate selfplay games over a range of decks (game seeds)
Dataset Splits	No	The paper describes how specific experimental situations (finesse-able and finesse-complete) are generated for evaluation, but it does not provide explicit training, validation, or test dataset splits with percentages, counts, or specific pre-defined split methodologies for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing infrastructure) used for running its experiments.
Software Dependencies	No	The paper mentions 'pyhanabi' for its Hanabi experiments and refers to various prior works for agents (e.g., MAPPO, QMIX, SAD, Other-Play, OBL), but it does not list specific version numbers for any key software components or libraries used in its own experimental setup.
Experiment Setup	Yes	The detailed hyper-parameters and computational cost are in Section C.