reproducibilityindex.ai

Discovering Reinforcement Learning Algorithms

Authors: Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado P. van Hasselt, Satinder Singh, David Silver

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that our method discovers its own alternative to the concept of value functions. Furthermore it discovers a bootstrapping mechanism to maintain and use its predictions.
Researcher Affiliation	Industry	Corresponding author: junhyuk@google.com
Pseudocode	Yes	Algorithm 1 Meta-Training of Learned Policy Gradient
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	For meta-training of LPG, we introduce three different kinds of toy domains as illustrated Figure 2. Tabular grid worlds are grid worlds with fixed object locations. Random grid worlds have randomised object locations for each episode. Delayed chain MDPs are simple MDPs with delayed rewards.
Dataset Splits	No	The paper describes 'Training Environments' and 'Atari games' for meta-training and meta-testing respectively, but it does not specify explicit training, validation, and test dataset splits with percentages or counts for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The algorithm is implemented using JAX [5].
Experiment Setup	Yes	We used a 30-dimensional prediction vector y [0, 1]30. During meta-training, we updated the agent parameters after every 20 time-steps.