Understanding and Preventing Capacity Loss in Reinforcement Learning
Authors: Clare Lyle, Mark Rowland, Will Dabney
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a rigorous empirical analysis of this phenomenon which considers both the ability of networks to learn new target functions via gradientbased optimization methods, and their ability to linearly disentangle states feature representations. |
| Researcher Affiliation | Collaboration | Clare Lyle Department of Computer Science University of Oxford Mark Rowland & Will Dabney Deep Mind |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Agent: We train a Rainbow agent (Hessel et al., 2018) with the same architecture and hyperparameters as are described in the open-source implementation made available by Quan & Ostrovski (2020). URL http://github.com/deepmind/dqn_zoo. |
| Open Datasets | Yes | To evaluate Hypothesis 1, we construct a series of toy iterative prediction problems on the MNIST data set, a widely-used computer vision benchmark which consists of images of handwritten digits and corresponding labels. |
| Dataset Splits | Yes | Training: We follow the training procedure found in the Rainbow implementation mentioned above. We train for 200 million frames, with 500K evaluation frames interspersed every 1M training frames. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the 'Jax Haiku library' and refers to specific algorithms like DQN, QR-DQN, Rainbow, and DDQN, but does not provide specific version numbers for these libraries or frameworks. |
| Experiment Setup | Yes | The evaluations in Figure 5 are for k = 10 heads with β = 100 and α = 0.1, and we show the method s robustness to these hyperparameters in Appendix C.1. |