reproducibilityindex.ai

On Efficiency in Hierarchical Reinforcement Learning

Authors: Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the beneﬁts of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efﬁcient HRL methods. Speciﬁcally, we formalize the intuition that HRL can exploit well repeating "sub MDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efﬁcient, as established through a ﬁnite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efﬁcient. In this paper, we present two general results which highlight the types of problems in which HRL is expected to provide beneﬁts, in terms of planning speed, as well as in terms of statistical efﬁciency.
Researcher Affiliation	Industry	Zheng Wen Deep Mind zhengwen@google.com Doina Precup Deep Mind doinap@google.com Morteza Ibrahimi Deep Mind mibrahimi@google.com Andre Barreto Deep Mind andrebarreto@google.com Benjamin Van Roy Deep Mind benvanroy@google.com Satinder Singh Deep Mind baveja@google.com
Pseudocode	Yes	Algorithm 1: PSRL with a Planner, Sampler, and Inferer; Algorithm 2: Planning with Exit Proﬁles (PEP)
Open Source Code	No	The paper is a theoretical investigation and does not mention providing open-source code for the described methodology. It refers to 'Behaviour suite for reinforcement learning' by other authors but does not state that its own code is available.
Open Datasets	No	This is a theoretical paper and does not describe experiments using datasets. Therefore, it does not provide information about publicly available datasets or access to them.
Dataset Splits	No	This is a theoretical paper and does not describe experiments. Thus, it does not provide information on training/test/validation dataset splits.
Hardware Specification	No	This is a theoretical paper and does not describe experiments. Therefore, it does not provide hardware specifications.
Software Dependencies	No	This is a theoretical paper and does not describe experiments. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup	No	This is a theoretical paper and does not describe empirical experiments. Therefore, it does not provide details about an experimental setup, hyperparameters, or training settings.