reproducibilityindex.ai

Discovery of Options via Meta-Learned Subgoals

Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado P. van Hasselt, David Silver, Satinder Singh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical analysis on gridworld and Deep Mind Lab tasks show that: (1) our approach can discover meaningful and diverse temporally-extended options in multi-task RL domains, (2) the discovered options are frequently used by the agent while learning to solve the training tasks, and (3) that the discovered options help a randomly initialised manager learn faster in completely new tasks.
Researcher Affiliation	Collaboration	Vivek Veeriah University of Michigan Tom Zahavy Deep Mind Matteo Hessel Deep Mind Zhongwen Xu Deep Mind Junhyuk Oh Deep Mind Iurii Kemaev Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh University of Michigan, Deep Mind
Pseudocode	Yes	Algorithm 1 Meta-gradient algorithm for option discovery
Open Source Code	No	The paper mentions 'Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix,' but it does not explicitly state that the source code for the methodology is openly available or provide a link.
Open Datasets	Yes	We applied MODAC to Deep Mind Lab (Beattie et al., 2016), a challenging suite of RL tasks with consistent physics and action spaces, and a ﬁrst-person view as observations to the agent (hence there is partial observability).
Dataset Splits	No	The paper describes a 'validation trajectory' used in its meta-gradient algorithm for evaluating parameter changes, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for a fixed dataset. Tasks are procedurally generated and sampled randomly.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or library versions used for implementation or experimentation.
Experiment Setup	Yes	Details on all the agents, their hyperparameter choices and other implementation details are in the Appendix. In each of the 4 training task-sets we discovered 5 options (using a switching cost c = 0.03)