Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generalized Beliefs for Cooperative AI
Authors: Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 2 shows the results of our generalized belief learning methodology (see Appendix C for an intuitive interpretation of cross entropy scores over unobservable features in a Dec POMDP). and Table 1 contains the results of our experiments on the generalized beliefโs ability to improve cross-play. |
| Researcher Affiliation | Academia | 1University of Oxford, England, United Kingdom. Correspondence to: Darius Muglich <EMAIL>. |
| Pseudocode | No | The paper describes methodologies and processes but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'. |
| Open Source Code | Yes | The code for learning a belief model with the transformer architecture may be found here: https://github.com/ gfppoy/hanabi-belief-transformer. |
| Open Datasets | Yes | We use the AI benchmark task and representative Dec POMDP Hanabi (Bard et al., 2020) for our experiments. We used thirteen pre-trained simplified action decoder (SAD) policies that were used in the work of Hu et al. (2020), and which we downloaded from their Git Hub repository.3 |
| Dataset Splits | No | The paper describes training on a collection of pre-trained policies and testing on policies not seen at training time, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used to train the belief model. |
| Hardware Specification | Yes | The machine used for experimentation consisted of 2 NVIDIA Ge Force RTX 2080 Ti GPUs and 40 CPU cores. |
| Software Dependencies | No | The paper mentions using specific codebases and a ReLU nonlinearity but does not list specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Table 2. Hyperparameter settings of transformer for belief emulation. Hyperparameters Value Number of layers 6 Number of attention heads 8 State embedding dimension (d in Section 3) 256 Feature embedding dimension (dfeature in Section 3) 128 Maximum sequence length (T in Section 3) 80 Feedforward network dimension 2048 Nonlinearity ReLU Batchsize 256 Dropout 0.1 Learning rate 2.5 10-4 Warm-up period 105 Learning rate decay Inverse square root |