reproducibilityindex.ai

Generalized Beliefs for Cooperative AI

Authors: Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2 shows the results of our generalized belief learning methodology (see Appendix C for an intuitive interpretation of cross entropy scores over unobservable features in a Dec POMDP). and Table 1 contains the results of our experiments on the generalized belief’s ability to improve cross-play.
Researcher Affiliation	Academia	1University of Oxford, England, United Kingdom. Correspondence to: Darius Muglich <dariusm1997@yahoo.com>.
Pseudocode	No	The paper describes methodologies and processes but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code	Yes	The code for learning a belief model with the transformer architecture may be found here: https://github.com/ gfppoy/hanabi-belief-transformer.
Open Datasets	Yes	We use the AI benchmark task and representative Dec POMDP Hanabi (Bard et al., 2020) for our experiments. We used thirteen pre-trained simplified action decoder (SAD) policies that were used in the work of Hu et al. (2020), and which we downloaded from their Git Hub repository.3
Dataset Splits	No	The paper describes training on a collection of pre-trained policies and testing on policies not seen at training time, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used to train the belief model.
Hardware Specification	Yes	The machine used for experimentation consisted of 2 NVIDIA Ge Force RTX 2080 Ti GPUs and 40 CPU cores.
Software Dependencies	No	The paper mentions using specific codebases and a ReLU nonlinearity but does not list specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup	Yes	Table 2. Hyperparameter settings of transformer for belief emulation. Hyperparameters Value Number of layers 6 Number of attention heads 8 State embedding dimension (d in Section 3) 256 Feature embedding dimension (dfeature in Section 3) 128 Maximum sequence length (T in Section 3) 80 Feedforward network dimension 2048 Nonlinearity ReLU Batchsize 256 Dropout 0.1 Learning rate 2.5 10-4 Warm-up period 105 Learning rate decay Inverse square root