Exploring the Promise and Limits of Real-Time Recurrent Learning

Authors: Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we present the main experiments of this work: RL in POMDPs using realistic game environments requiring memory. Other synthetic/diagnostic experiments are reported in Appendix B.1/B.2. DMLab Memory Tasks. DMLab-30 (Beattie et al., 2016) is a collection of 30 first-person 3D game environments, with a mix of both memory and reactive tasks. Here we focus on two well-known environments, rooms select nonmatching object and rooms watermaze, which are both categorised as memory tasks according to Parisotto et al. (2020).
Researcher Affiliation Academia Kazuki Irie1 Anand Gopalakrishnan2 J urgen Schmidhuber2,3 1Center for Brain Science, Harvard University, Cambridge, MA, USA 2The Swiss AI Lab, IDSIA, USI & SUPSI, Lugano, Switzerland 3AI Initiative, KAUST, Thuwal, Saudi Arabia kirie@fas.harvard.edu, {anand, juergen}@idsia.ch
Pseudocode No The paper provides mathematical derivations and descriptions of algorithms but does not contain a dedicated pseudocode or algorithm block.
Open Source Code Yes 1Our code is public: https://github.com/IDSIA/rtrl-elstm
Open Datasets Yes DMLab-30 (Beattie et al., 2016), Proc Gen (Cobbe et al., 2020), and Atari 2600 (Bellemare et al., 2013) environments.
Dataset Splits No The paper describes evaluation on 'test episodes' and pre-training steps, but does not explicitly provide details about distinct training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification Yes Each run requires about a day of training on a V100 GPU (this is also the case with Proc Gen and Atari).
Software Dependencies No The paper mentions 'Py Torch code' and 'torchbeast (K uttler et al., 2019)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Table 4: Hyper-parameters for RL experiments. Parameters at the bottom are common to all settings (which are essentially taken from the Atari configuration of Espeholt et al. (2018)).