Discovery of Options via Meta-Learned Subgoals
Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado P. van Hasselt, David Silver, Satinder Singh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical analysis on gridworld and Deep Mind Lab tasks show that: (1) our approach can discover meaningful and diverse temporally-extended options in multi-task RL domains, (2) the discovered options are frequently used by the agent while learning to solve the training tasks, and (3) that the discovered options help a randomly initialised manager learn faster in completely new tasks. |
| Researcher Affiliation | Collaboration | Vivek Veeriah University of Michigan Tom Zahavy Deep Mind Matteo Hessel Deep Mind Zhongwen Xu Deep Mind Junhyuk Oh Deep Mind Iurii Kemaev Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh University of Michigan, Deep Mind |
| Pseudocode | Yes | Algorithm 1 Meta-gradient algorithm for option discovery |
| Open Source Code | No | The paper mentions 'Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix,' but it does not explicitly state that the source code for the methodology is openly available or provide a link. |
| Open Datasets | Yes | We applied MODAC to Deep Mind Lab (Beattie et al., 2016), a challenging suite of RL tasks with consistent physics and action spaces, and a first-person view as observations to the agent (hence there is partial observability). |
| Dataset Splits | No | The paper describes a 'validation trajectory' used in its meta-gradient algorithm for evaluating parameter changes, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset. Tasks are procedurally generated and sampled randomly. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library versions used for implementation or experimentation. |
| Experiment Setup | Yes | Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix. In each of the 4 training task-sets we discovered 5 options (using a switching cost c = 0.03) |