Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized Beliefs for Cooperative AI

Authors: Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 2 shows the results of our generalized belief learning methodology (see Appendix C for an intuitive interpretation of cross entropy scores over unobservable features in a Dec POMDP). and Table 1 contains the results of our experiments on the generalized beliefโ€™s ability to improve cross-play.
Researcher Affiliation Academia 1University of Oxford, England, United Kingdom. Correspondence to: Darius Muglich <EMAIL>.
Pseudocode No The paper describes methodologies and processes but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code Yes The code for learning a belief model with the transformer architecture may be found here: https://github.com/ gfppoy/hanabi-belief-transformer.
Open Datasets Yes We use the AI benchmark task and representative Dec POMDP Hanabi (Bard et al., 2020) for our experiments. We used thirteen pre-trained simplified action decoder (SAD) policies that were used in the work of Hu et al. (2020), and which we downloaded from their Git Hub repository.3
Dataset Splits No The paper describes training on a collection of pre-trained policies and testing on policies not seen at training time, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used to train the belief model.
Hardware Specification Yes The machine used for experimentation consisted of 2 NVIDIA Ge Force RTX 2080 Ti GPUs and 40 CPU cores.
Software Dependencies No The paper mentions using specific codebases and a ReLU nonlinearity but does not list specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes Table 2. Hyperparameter settings of transformer for belief emulation. Hyperparameters Value Number of layers 6 Number of attention heads 8 State embedding dimension (d in Section 3) 256 Feature embedding dimension (dfeature in Section 3) 128 Maximum sequence length (T in Section 3) 80 Feedforward network dimension 2048 Nonlinearity ReLU Batchsize 256 Dropout 0.1 Learning rate 2.5 10-4 Warm-up period 105 Learning rate decay Inverse square root