reproducibilityindex.ai

Discovery of Useful Questions as Auxiliary Tasks

Authors: Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that auxiliary tasks based on the discovered GVFs are sufﬁcient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efﬁciency of an actor-critic agent.
Researcher Affiliation	Collaboration	Vivek Veeriah1 Matteo Hessel2 Zhongwen Xu2 Richard Lewis1 Janarthanan Rajendran1 Junhyuk Oh2 Hado van Hasselt2 David Silver2 Satinder Singh1,2 1University of Michigan, Ann Arbor. Corresponding author: Vivek Veeriah vveeriah@umich.edu 2Deep Mind, London.
Pseudocode	Yes	Algorithm 1 Multi-Step Meta-Gradient Discovery of Questions for Auxiliary Tasks Initialize parameters θ, η for t = 1, 2, , N do θt,0 θt for k = 1, 2, , L do Generate experience using parameters θt,k 1 θt,k θt,k 1 α θt,k 1LRL(θt,k 1) α θt,k 1Lans(θt,k 1) end for ηt+1 ηt α η PL k=1 LRL(θt,k) θt+1 θt,L end for
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Puddleworld domain: is a continuous state gridworld domain (Degris et al., 2012)... Atari domain: the Atari games were designed to be challenging and fun for human players, and were packaged up into a canonical benchmark for RL agents: the Arcade Learning Environment (Bellemare et al., 2013; Mnih et al., 2015, 2016; Schulman et al., 2015, 2017; Hessel et al., 2018).
Dataset Splits	No	The paper mentions using domains like Puddleworld, Collect-objects, and Atari, and describes training processes, but it does not provide explicit details about dataset splits (e.g., percentages, sample counts, or cross-validation setup) for training, validation, and testing.
Hardware Specification	No	The paper discusses aspects of computational scale such as "16 parallel actor threads" and "200 distributed actors" but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions implementing meta-gradients on top of a "5-step actor-critic agent" and using a "20-step IMPALA (Espeholt et al., 2018) agent", but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In this section we outline the experimental setup, including the environments we used as test-beds and the high level agent and neural network architectures. We refer to the Appendix for more details. and On the left in Figure 5, we report a parameter study, plotting the performance of the agent with meta-learned auxiliary tasks as a function of the number of questions d. and On the right, in Figure 5 we report the effect on performance of the number k of unrolled steps used for the meta-gradient computation. and Note that neither d nor k were tuned in other experiments, with all other results using the same ﬁxed settings of d = 128 and k = 10.