reproducibilityindex.ai

Bootstrapped Representations in Reinforcement Learning

Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G Bellemare, Will Dabney

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al., 1999) and Mountain Car (Moore, 1990). We present an empirical evaluation that supports our theoretical characterizations and show the importance of the choice of a learning rule to learn the value function in Section 5. Figure 4: Subspace distance... over the course of training Figure 5: Subspace distance after 5 105 training steps Figure 6: Comparing effects of ofﬂine pre-training on the Four Rooms (left) and sparse Mountain Car (right) domains for different cumulant generation methods.
Researcher Affiliation	Collaboration	1University of Oxford 2Google Deep Mind. Correspondence to: Charline Le Lan <charline.lelan@stats.ox.ac.uk>.
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper states 'We also thank Jesse Farebrother and Joshua Greaves for help with the Proto-Value Networks codebase (Farebrother et al., 2023).', which refers to using an external codebase, not releasing their own source code for the methodology described in the paper.
Open Datasets	Yes	We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al., 1999) and Mountain Car (Moore, 1990). These are well-known, publicly available benchmark environments/datasets.
Dataset Splits	No	The paper mentions 'The ofﬂine pre-training dataset contains 100000 and 200000 transitions for four-room and mountain car respectively' and training uses a 'replay buffer', but it does not specify explicit training, validation, or test splits with percentages or counts.
Hardware Specification	No	The paper does not specify any particular hardware details such as GPU models, CPU types, or memory used for conducting the experiments.
Software Dependencies	No	The paper cites 'Num Py (Oliphant, 2006; Walt et al., 2011; Harris et al., 2020), Sci Py (Jones et al., 2001), Matplotlib (Hunter, 2007) and JAX (Bradbury et al., 2018)'. While these libraries are mentioned, specific version numbers (e.g., NumPy 1.20) are not provided for them.
Experiment Setup	Yes	In this experiment, we selected a step size α = 0.08 for all the algorithms. We use a step size α = 5e-3 and train the different learning rules for 500k steps with 3 seeds. The learning rate for both ofﬂine and online training was the same as the standard DQN learning rate (0.00025), and similarly for the optimizer epsilon. The network architecture is a simple fully connected MLP with Re LU activations (Nair and Hinton, 2010) and two hidden layers of size 512 (ﬁrst) and 256 (second), followed by a linear layer to give action-values.