CoBERL: Contrastive BERT for Reinforcement Learning
Authors: Andrea Banino, Adria Puigdomenech Badia, Jacob C Walker, Tim Scholtes, Jovana Mitrovic, Charles Blundell
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D first-person and third-person view tasks. |
| Researcher Affiliation | Industry | 01 Deep Mind London |
| Pseudocode | Yes | We also report the pseudo-code for the algorithm and the auxiliary loss in Appendix H |
| Open Source Code | No | The paper provides links to the source code of external tools and environments used (Arcade Learning Environment, DeepMind Control Suite, DMLab), but does not provide a specific link or explicit statement about the open-source release of the COBERL implementation itself. |
| Open Datasets | Yes | We extensively test our proposed agent across a widely varied set of environments and tasks ranging from 2D platform games to 3D first-person and third-person view tasks. Specifically, we test it in the control domain using Deep Mind Control Suite (Tassa et al., 2018) and probe its memory abilities using DMLab-30 (Beattie et al., 2016). We also test our agent on all 57 Atari games (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions using standard environments and preprocessing but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit details about how these splits were managed for reproducibility within its own experimental setup. |
| Hardware Specification | Yes | R2D2 We train the agent with a single TPU v2-based learner... In particular, we used 8 TPU cores for learning and 2 for inference. V-MPO We train the agent with 4 hosts each with 8 TPU v2 cores. |
| Software Dependencies | No | The paper mentions using Adam optimizer, and implies JAX (via `jnp` in pseudocode) and scipy (via `intergate`), but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The hyper-parameters of all the baselines are tuned individually to maximise performance (see App. C.5 for the detailed procedure)." and tables such as "Table 10: GTr XL Hyperparameters used in all the R2D2 experiments with range of sweep." which lists specific values and ranges for parameters like "Learning rate {0.0001, 0.0003}", "Batch size 32", "Trace length (Atari) 80", etc. |