reproducibilityindex.ai

A Self-Tuning Actor-Critic Algorithm

Authors: Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to the Arcade Learning Environment (Bellemare et al. 2012), STAC improved the median human normalized score in 200M steps from 243% to 364%. When applied to the DM Control suite (Tassa et al., 2018), STAC improved the mean score in 30M steps from 217 to 389 when learning with features, from 108 to 202 when learning from pixels, and from 195 to 295 in the Real-World Reinforcement Learning Challenge (Dulac-Arnold et al., 2020).
Researcher Affiliation	Industry	Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver and Satinder Singh Deepmind {tomzahavy,zhongwen,vveeriah,mtthss,junhyuk,hado,davidsilver,baveja}@google.com
Pseudocode	Yes	The exact details can be found in the supplementary (Algorithm 2, line 11).
Open Source Code	No	The paper cites various third-party libraries and frameworks (e.g., JAX, RLax, Haiku, Optax) with their respective URLs, but does not provide a direct link or explicit statement for the open-sourcing of the STAC/STACX implementation described in the paper.
Open Datasets	Yes	When applied to the Arcade Learning Environment (Bellemare et al., 2013, ALE)... When applied to the DM Control suite (Tassa et al., 2018)...
Dataset Splits	No	The paper discusses evaluation using median human normalized scores after a certain number of frames and averaging over seeds, which is typical for RL environments, but does not explicitly provide dataset split percentages (e.g., train/validation/test splits) in the traditional supervised learning sense for reproducibility.
Hardware Specification	No	The paper states 'does not require a signiﬁcant increase in compute (see Table 4 in the supplementary and the discussion that follows it)', implying details might be in the supplementary material. However, the provided main paper text does not specify any particular hardware (e.g., CPU/GPU models, RAM, or specific TPU versions) used for the experiments.
Software Dependencies	No	The paper mentions software like 'JAX (Bradbury et al., 2018)', 'RLax (Budden et al., 2020)', 'Haiku (Hennigan et al., 2020)', and 'Optax (Hessel et al., 2020)' with publication years, but does not provide specific version numbers for these software dependencies (e.g., PyTorch 1.9 or JAX 0.2.1).
Experiment Setup	Yes	For the outer loss hyperparameters, we use exactly the same hyperparameters that were used in the IMPALA paper for all of our agents (gouter v = 0.25, gouter p = 1, gouter v = 1, λouter = 1), with one exception: we use γ = 0.995... For the initializations of the metaparameters we use the corresponding parameters in the outer loss, i.e., for any metaparameter ηi, we set ηInit i = 4.6 such that σ(ηInit i ) = 0.99... For the meta optimizer, we use ADAM with default settings (e.g., learning rate is set to 10 3), and for the the KL coefﬁcient, we use gouter kl = 1).