Self-Explaining Deviations for Coordination

Authors: Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob Foerster

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we evaluate IMPROVISED both in an illustrative toy setting and the popular benchmark setting Hanabi, where we show that it can produce so called finesse plays. We test the IMPROVISED in two different settings. The first setting is the trampoline-tiger game explained before. Secondly, we apply IMPROVISED to three-player Hanabi, where we start from a blueprint trained on human data.
Researcher Affiliation Collaboration Hengyuan Hu Stanford University hengyuan@cs.stanford.edu Samuel Sokota Carnegie Mellon University ssokota@andrew.cmu.edu Meta AI dwu@meta.com Anton Bakhtin Meta AI yolo@meta.com Andrei Lupu Meta AI & FLAIR, University of Oxford alupu@meta.com Brandon Cui Mosaic ML brandon@mosaicml.com Jakob N. Foerster FLAIR, University of Oxford jakob.foerster@eng.ox.ac.uk
Pseudocode Yes Please refer to the Appendix A for the detailed pseudocode.
Open Source Code Yes We provide the code for our Hanabi experiments at https://github.com/facebookresearch/off-belief-learning/blob/main/ pyhanabi/finesse.py.
Open Datasets Yes Lastly, we present experiments on the large scale benchmark Hanabi [1], where we show that IMPROVISED is able to produce finesse plays, which is one of the most interesting techniques that human experts perform frequently. To implement IMPROVISED in Hanabi, we first need a belief function from which we can sample game states given either public or private knowledge of the game to perform Monte Carlo rollouts. Luckily, the belief over possible hands in Hanabi can be computed analytically [8]. We use a blueprint policy to generate selfplay games over a range of decks (game seeds)
Dataset Splits No The paper describes how specific experimental situations (finesse-able and finesse-complete) are generated for evaluation, but it does not provide explicit training, validation, or test dataset splits with percentages, counts, or specific pre-defined split methodologies for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing infrastructure) used for running its experiments.
Software Dependencies No The paper mentions 'pyhanabi' for its Hanabi experiments and refers to various prior works for agents (e.g., MAPPO, QMIX, SAD, Other-Play, OBL), but it does not list specific version numbers for any key software components or libraries used in its own experimental setup.
Experiment Setup Yes The detailed hyper-parameters and computational cost are in Section C.