CoBERL: Contrastive BERT for Reinforcement Learning

Authors: Andrea Banino, Adria Puigdomenech Badia, Jacob C Walker, Tim Scholtes, Jovana Mitrovic, Charles Blundell

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D first-person and third-person view tasks.
Researcher Affiliation Industry 01 Deep Mind London
Pseudocode Yes We also report the pseudo-code for the algorithm and the auxiliary loss in Appendix H
Open Source Code No The paper provides links to the source code of external tools and environments used (Arcade Learning Environment, DeepMind Control Suite, DMLab), but does not provide a specific link or explicit statement about the open-source release of the COBERL implementation itself.
Open Datasets Yes We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D first-person and third-person view tasks. Specifically, we test it in the control domain using Deep Mind Control Suite (Tassa et al., 2018) and probe its memory abilities using DMLab-30 (Beattie et al., 2016). We also test our agent on all 57 Atari games (Bellemare et al., 2013).
Dataset Splits No The paper mentions using standard environments and preprocessing but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit details about how these splits were managed for reproducibility within its own experimental setup.
Hardware Specification Yes R2D2 We train the agent with a single TPU v2-based learner... In particular, we used 8 TPU cores for learning and 2 for inference. V-MPO We train the agent with 4 hosts each with 8 TPU v2 cores.
Software Dependencies No The paper mentions using Adam optimizer, and implies JAX (via `jnp` in pseudocode) and scipy (via `intergate`), but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes The hyper-parameters of all the baselines are tuned individually to maximise performance (see App. C.5 for the detailed procedure)." and tables such as "Table 10: GTr XL Hyperparameters used in all the R2D2 experiments with range of sweep." which lists specific values and ranges for parameters like "Learning rate {0.0001, 0.0003}", "Batch size 32", "Trace length (Atari) 80", etc.