Discovery of Options via Meta-Learned Subgoals

Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado P. van Hasselt, David Silver, Satinder Singh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical analysis on gridworld and Deep Mind Lab tasks show that: (1) our approach can discover meaningful and diverse temporally-extended options in multi-task RL domains, (2) the discovered options are frequently used by the agent while learning to solve the training tasks, and (3) that the discovered options help a randomly initialised manager learn faster in completely new tasks.
Researcher Affiliation Collaboration Vivek Veeriah University of Michigan Tom Zahavy Deep Mind Matteo Hessel Deep Mind Zhongwen Xu Deep Mind Junhyuk Oh Deep Mind Iurii Kemaev Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh University of Michigan, Deep Mind
Pseudocode Yes Algorithm 1 Meta-gradient algorithm for option discovery
Open Source Code No The paper mentions 'Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix,' but it does not explicitly state that the source code for the methodology is openly available or provide a link.
Open Datasets Yes We applied MODAC to Deep Mind Lab (Beattie et al., 2016), a challenging suite of RL tasks with consistent physics and action spaces, and a first-person view as observations to the agent (hence there is partial observability).
Dataset Splits No The paper describes a 'validation trajectory' used in its meta-gradient algorithm for evaluating parameter changes, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset. Tasks are procedurally generated and sampled randomly.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or library versions used for implementation or experimentation.
Experiment Setup Yes Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix. In each of the 4 training task-sets we discovered 5 options (using a switching cost c = 0.03)