Understanding Self-Predictive Learning for Reinforcement Learning

Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
Researcher Affiliation Collaboration 1Google DeepMind 2University of Massachusetts 3University of Oxford.
Pseudocode Yes Algorithm 1 Self-predictive learning. Algorithm 2 Bidirectional self-predictive learning.
Open Source Code No The paper does not provide any links to source code or explicitly state that source code for the described methodology is available.
Open Datasets Yes Our testbed is DMLab-30, a collection of 30 diverse partially observable cognitive tasks in the 3D Deep Mind Lab (Beattie et al., 2016).
Dataset Splits No The paper mentions using DMLab-30 and randomly generated MDPs for experiments but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or explicit split methodologies).
Hardware Specification No The paper describes its deep RL implementation and experiments but does not provide specific details about the hardware used, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies Yes In Figs. 3 and 4, we simulate the exact ODE dynamics using the Scipy ODE solver (Virtanen et al., 2020).
Experiment Setup Yes In Fig. 9(a) shows the effect of finite learning rate on the preservation of the cosine similarity between two representation vectors ϕ1,t and ϕ2,t. ... We consider a grid of learning rate η {0.01, 0.1, 1, 10}.