reproducibilityindex.ai

Stabilizing Transformers for Reinforcement Learning

Authors: Emilio Parisotto, Francis Song, Jack Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphaël Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we demonstrate that the standard transformer architecture is difﬁcult to optimize... We propose architectural modiﬁcations that substantially improve the stability and learning speed... We perform extensive ablations on the GTr XL in challenging environments...
Researcher Affiliation	Collaboration	1Department of Machine Learning, Carnegie Mellon University, Pittsburgh, USA 2Google Deep Mind, London, UK.
Pseudocode	No	The paper provides mathematical equations and descriptions of the architecture but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement about releasing the source code for their GTr XL implementation. The footnote 'www.github.com/tensorflow/tensor2tensor' refers to existing general transformer implementations, not the authors' specific code for this work.
Open Datasets	Yes	DMLab30 (Beattie et al., 2016), Numpad and Memory Maze (see App. Fig. 8 and 9)... Atari-57 (Bellemare et al., 2013)
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits. It mentions using '6-8 hyperparameter settings per model' and splitting DMLab-30 into 'Memory and Reactive split' levels, but not data partitions.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments within the main text. While 'Cloud TPU' is mentioned in the references, there are no specific details like model numbers or quantities.
Software Dependencies	No	The paper mentions software like V-MPO, IMPALA, and PyTorch but does not provide specific version numbers for any of them.
Experiment Setup	Yes	For all transformer architectures except when otherwise stated, we train relatively deep 12-layer networks with embedding size 256 and memory size 512.