Stabilizing Transformers for Reinforcement Learning
Authors: Emilio Parisotto, Francis Song, Jack Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphaël Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we demonstrate that the standard transformer architecture is difficult to optimize... We propose architectural modifications that substantially improve the stability and learning speed... We perform extensive ablations on the GTr XL in challenging environments... |
| Researcher Affiliation | Collaboration | 1Department of Machine Learning, Carnegie Mellon University, Pittsburgh, USA 2Google Deep Mind, London, UK. |
| Pseudocode | No | The paper provides mathematical equations and descriptions of the architecture but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about releasing the source code for their GTr XL implementation. The footnote 'www.github.com/tensorflow/tensor2tensor' refers to existing general transformer implementations, not the authors' specific code for this work. |
| Open Datasets | Yes | DMLab30 (Beattie et al., 2016), Numpad and Memory Maze (see App. Fig. 8 and 9)... Atari-57 (Bellemare et al., 2013) |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits. It mentions using '6-8 hyperparameter settings per model' and splitting DMLab-30 into 'Memory and Reactive split' levels, but not data partitions. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments within the main text. While 'Cloud TPU' is mentioned in the references, there are no specific details like model numbers or quantities. |
| Software Dependencies | No | The paper mentions software like V-MPO, IMPALA, and PyTorch but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For all transformer architectures except when otherwise stated, we train relatively deep 12-layer networks with embedding size 256 and memory size 512. |