reproducibilityindex.ai

CoBERL: Contrastive BERT for Reinforcement Learning

Authors: Andrea Banino, Adria Puigdomenech Badia, Jacob C Walker, Tim Scholtes, Jovana Mitrovic, Charles Blundell

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D ﬁrst-person and third-person view tasks.
Researcher Affiliation	Industry	01 Deep Mind London
Pseudocode	Yes	We also report the pseudo-code for the algorithm and the auxiliary loss in Appendix H
Open Source Code	No	The paper provides links to the source code of external tools and environments used (Arcade Learning Environment, DeepMind Control Suite, DMLab), but does not provide a specific link or explicit statement about the open-source release of the COBERL implementation itself.
Open Datasets	Yes	We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D ﬁrst-person and third-person view tasks. Speciﬁcally, we test it in the control domain using Deep Mind Control Suite (Tassa et al., 2018) and probe its memory abilities using DMLab-30 (Beattie et al., 2016). We also test our agent on all 57 Atari games (Bellemare et al., 2013).
Dataset Splits	No	The paper mentions using standard environments and preprocessing but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit details about how these splits were managed for reproducibility within its own experimental setup.
Hardware Specification	Yes	R2D2 We train the agent with a single TPU v2-based learner... In particular, we used 8 TPU cores for learning and 2 for inference. V-MPO We train the agent with 4 hosts each with 8 TPU v2 cores.
Software Dependencies	No	The paper mentions using Adam optimizer, and implies JAX (via `jnp` in pseudocode) and scipy (via `intergate`), but it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	The hyper-parameters of all the baselines are tuned individually to maximise performance (see App. C.5 for the detailed procedure)." and tables such as "Table 10: GTr XL Hyperparameters used in all the R2D2 experiments with range of sweep." which lists specific values and ranges for parameters like "Learning rate {0.0001, 0.0003}", "Batch size 32", "Trace length (Atari) 80", etc.