Understanding and Preventing Capacity Loss in Reinforcement Learning

Authors: Clare Lyle, Mark Rowland, Will Dabney

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a rigorous empirical analysis of this phenomenon which considers both the ability of networks to learn new target functions via gradientbased optimization methods, and their ability to linearly disentangle states feature representations.
Researcher Affiliation Collaboration Clare Lyle Department of Computer Science University of Oxford Mark Rowland & Will Dabney Deep Mind
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Agent: We train a Rainbow agent (Hessel et al., 2018) with the same architecture and hyperparameters as are described in the open-source implementation made available by Quan & Ostrovski (2020). URL http://github.com/deepmind/dqn_zoo.
Open Datasets Yes To evaluate Hypothesis 1, we construct a series of toy iterative prediction problems on the MNIST data set, a widely-used computer vision benchmark which consists of images of handwritten digits and corresponding labels.
Dataset Splits Yes Training: We follow the training procedure found in the Rainbow implementation mentioned above. We train for 200 million frames, with 500K evaluation frames interspersed every 1M training frames.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions software like the 'Jax Haiku library' and refers to specific algorithms like DQN, QR-DQN, Rainbow, and DDQN, but does not provide specific version numbers for these libraries or frameworks.
Experiment Setup Yes The evaluations in Figure 5 are for k = 10 heads with β = 100 and α = 0.1, and we show the method s robustness to these hyperparameters in Appendix C.1.