Understanding Self-Predictive Learning for Reinforcement Learning
Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments. |
| Researcher Affiliation | Collaboration | 1Google DeepMind 2University of Massachusetts 3University of Oxford. |
| Pseudocode | Yes | Algorithm 1 Self-predictive learning. Algorithm 2 Bidirectional self-predictive learning. |
| Open Source Code | No | The paper does not provide any links to source code or explicitly state that source code for the described methodology is available. |
| Open Datasets | Yes | Our testbed is DMLab-30, a collection of 30 diverse partially observable cognitive tasks in the 3D Deep Mind Lab (Beattie et al., 2016). |
| Dataset Splits | No | The paper mentions using DMLab-30 and randomly generated MDPs for experiments but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or explicit split methodologies). |
| Hardware Specification | No | The paper describes its deep RL implementation and experiments but does not provide specific details about the hardware used, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | Yes | In Figs. 3 and 4, we simulate the exact ODE dynamics using the Scipy ODE solver (Virtanen et al., 2020). |
| Experiment Setup | Yes | In Fig. 9(a) shows the effect of finite learning rate on the preservation of the cosine similarity between two representation vectors ϕ1,t and ϕ2,t. ... We consider a grid of learning rate η {0.01, 0.1, 1, 10}. |