Generalized Beliefs for Cooperative AI

Authors: Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 2 shows the results of our generalized belief learning methodology (see Appendix C for an intuitive interpretation of cross entropy scores over unobservable features in a Dec POMDP). and Table 1 contains the results of our experiments on the generalized belief’s ability to improve cross-play.
Researcher Affiliation Academia 1University of Oxford, England, United Kingdom. Correspondence to: Darius Muglich <dariusm1997@yahoo.com>.
Pseudocode No The paper describes methodologies and processes but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code Yes The code for learning a belief model with the transformer architecture may be found here: https://github.com/ gfppoy/hanabi-belief-transformer.
Open Datasets Yes We use the AI benchmark task and representative Dec POMDP Hanabi (Bard et al., 2020) for our experiments. We used thirteen pre-trained simplified action decoder (SAD) policies that were used in the work of Hu et al. (2020), and which we downloaded from their Git Hub repository.3
Dataset Splits No The paper describes training on a collection of pre-trained policies and testing on policies not seen at training time, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used to train the belief model.
Hardware Specification Yes The machine used for experimentation consisted of 2 NVIDIA Ge Force RTX 2080 Ti GPUs and 40 CPU cores.
Software Dependencies No The paper mentions using specific codebases and a ReLU nonlinearity but does not list specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes Table 2. Hyperparameter settings of transformer for belief emulation. Hyperparameters Value Number of layers 6 Number of attention heads 8 State embedding dimension (d in Section 3) 256 Feature embedding dimension (dfeature in Section 3) 128 Maximum sequence length (T in Section 3) 80 Feedforward network dimension 2048 Nonlinearity ReLU Batchsize 256 Dropout 0.1 Learning rate 2.5 10-4 Warm-up period 105 Learning rate decay Inverse square root