Discovery of Useful Questions as Auxiliary Tasks
Authors: Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent. |
| Researcher Affiliation | Collaboration | Vivek Veeriah1 Matteo Hessel2 Zhongwen Xu2 Richard Lewis1 Janarthanan Rajendran1 Junhyuk Oh2 Hado van Hasselt2 David Silver2 Satinder Singh1,2 1University of Michigan, Ann Arbor. Corresponding author: Vivek Veeriah vveeriah@umich.edu 2Deep Mind, London. |
| Pseudocode | Yes | Algorithm 1 Multi-Step Meta-Gradient Discovery of Questions for Auxiliary Tasks Initialize parameters θ, η for t = 1, 2, , N do θt,0 θt for k = 1, 2, , L do Generate experience using parameters θt,k 1 θt,k θt,k 1 α θt,k 1LRL(θt,k 1) α θt,k 1Lans(θt,k 1) end for ηt+1 ηt α η PL k=1 LRL(θt,k) θt+1 θt,L end for |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Puddleworld domain: is a continuous state gridworld domain (Degris et al., 2012)... Atari domain: the Atari games were designed to be challenging and fun for human players, and were packaged up into a canonical benchmark for RL agents: the Arcade Learning Environment (Bellemare et al., 2013; Mnih et al., 2015, 2016; Schulman et al., 2015, 2017; Hessel et al., 2018). |
| Dataset Splits | No | The paper mentions using domains like Puddleworld, Collect-objects, and Atari, and describes training processes, but it does not provide explicit details about dataset splits (e.g., percentages, sample counts, or cross-validation setup) for training, validation, and testing. |
| Hardware Specification | No | The paper discusses aspects of computational scale such as "16 parallel actor threads" and "200 distributed actors" but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper mentions implementing meta-gradients on top of a "5-step actor-critic agent" and using a "20-step IMPALA (Espeholt et al., 2018) agent", but it does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In this section we outline the experimental setup, including the environments we used as test-beds and the high level agent and neural network architectures. We refer to the Appendix for more details. and On the left in Figure 5, we report a parameter study, plotting the performance of the agent with meta-learned auxiliary tasks as a function of the number of questions d. and On the right, in Figure 5 we report the effect on performance of the number k of unrolled steps used for the meta-gradient computation. and Note that neither d nor k were tuned in other experiments, with all other results using the same fixed settings of d = 128 and k = 10. |