Generalized Beliefs for Cooperative AI
Authors: Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 2 shows the results of our generalized belief learning methodology (see Appendix C for an intuitive interpretation of cross entropy scores over unobservable features in a Dec POMDP). and Table 1 contains the results of our experiments on the generalized belief’s ability to improve cross-play. |
| Researcher Affiliation | Academia | 1University of Oxford, England, United Kingdom. Correspondence to: Darius Muglich <dariusm1997@yahoo.com>. |
| Pseudocode | No | The paper describes methodologies and processes but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'. |
| Open Source Code | Yes | The code for learning a belief model with the transformer architecture may be found here: https://github.com/ gfppoy/hanabi-belief-transformer. |
| Open Datasets | Yes | We use the AI benchmark task and representative Dec POMDP Hanabi (Bard et al., 2020) for our experiments. We used thirteen pre-trained simplified action decoder (SAD) policies that were used in the work of Hu et al. (2020), and which we downloaded from their Git Hub repository.3 |
| Dataset Splits | No | The paper describes training on a collection of pre-trained policies and testing on policies not seen at training time, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used to train the belief model. |
| Hardware Specification | Yes | The machine used for experimentation consisted of 2 NVIDIA Ge Force RTX 2080 Ti GPUs and 40 CPU cores. |
| Software Dependencies | No | The paper mentions using specific codebases and a ReLU nonlinearity but does not list specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Table 2. Hyperparameter settings of transformer for belief emulation. Hyperparameters Value Number of layers 6 Number of attention heads 8 State embedding dimension (d in Section 3) 256 Feature embedding dimension (dfeature in Section 3) 128 Maximum sequence length (T in Section 3) 80 Feedforward network dimension 2048 Nonlinearity ReLU Batchsize 256 Dropout 0.1 Learning rate 2.5 10-4 Warm-up period 105 Learning rate decay Inverse square root |