reproducibilityindex.ai

Off-Belief Learning

Authors: Hengyuan Hu, Adam Lerer, Brandon Cui, Luis Pineda, Noam Brown, Jakob Foerster

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then evaluate OBL in both a toy setting and Hanabi. In the toy setting, we demonstrate that OBL learns an optimal grounded policy while other existing methods such as SP and Cognitive Hierarchies do not. In Hanabi, OBL finds fully-grounded policies that reach a score of 20.92 in SP without relying on conventions, an important data point that tells us how well we can perform in this benchmark without conventions.
Researcher Affiliation	Industry	1Facebook AI Research.
Pseudocode	No	The paper provides diagrams and textual descriptions of algorithms (Figure 1), but no formal pseudocode or algorithm blocks.
Open Source Code	Yes	We will open source our code and all models.
Open Datasets	No	The paper mentions using the Hanabi environment for experiments, which is a popular benchmark, and also refers to 'human game data collected from an online board game platform' used for Clone Bot, but it does not provide concrete access information (link, DOI, repository, or citation) for these datasets as publicly available or open for direct download by others.
Dataset Splits	No	As a reinforcement learning paper, the data is generated dynamically through interaction with environments, rather than using static datasets with pre-defined training, validation, and test splits. The paper does not specify percentages or sample counts for such splits of a static dataset.
Hardware Specification	No	The paper mentions running experiments 'efficiently on GPUs' but does not provide specific details on the GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or other hardware components used.
Software Dependencies	No	The paper mentions using 'R2D2' as its backbone and various deep learning techniques (e.g., double-DQN, prioritized experience replay, Adam optimizer), but it does not specify concrete version numbers for any software libraries, programming languages (e.g., Python 3.x), or specialized frameworks used for implementation.
Experiment Setup	No	The paper describes general training processes, such as using parallel environments, Q-function approximation, and a replay buffer. It mentions a 'temperature hyperparameter T' but does not provide specific numerical values for common experimental setup details like learning rates, batch sizes, number of epochs, or detailed network architectures in the main text. It refers to Appendix B for 'neural network design, hyper-parameters and computation cost'.