Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Authors: Zhaohan Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Remi Munos, Mohammad Gheshlaghi Azar
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask suites. |
| Researcher Affiliation | Industry | 1Deep Mind, London, UK 2Deep Mind Paris, France. |
| Pseudocode | Yes | See algorithm 1 for a high-level overview of the algorithm. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We evaluated PBL in the DMLab 30 task set (Beattie et al., 2016). Additionally, 'We also ran PBL along with pixel control, CPC and RLonly on the Atari-57 (Bellemare et al., 2013)'. |
| Dataset Splits | No | While the paper uses standard benchmarks (DMLab 30, Atari-57) that typically have predefined splits, it does not explicitly provide the training/validation/test split percentages or counts used in their specific experiments. |
| Hardware Specification | Yes | All experiments were run on NVIDIA P100 GPUs. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' in the bibliography but does not specify its version number or any other software dependencies with specific version details. |
| Experiment Setup | Yes | For PBL, Simcore DRAW, and CPC, we predict up to 20 steps into the future. We increased the LSTM from one 256 unit hidden layer to two 512 unit hidden layers; 2) We changed the linear policy and value heads to have each an additional 512 unit hidden layer. For PBL, we use MLPs with 2 hidden layers of 512 units each. The entropy cost is now set to 0.01; the unroll length is set to 30. |