On Efficiency in Hierarchical Reinforcement Learning

Authors: Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the benefits of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efficient HRL methods. Specifically, we formalize the intuition that HRL can exploit well repeating "sub MDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efficient, as established through a finite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efficient. In this paper, we present two general results which highlight the types of problems in which HRL is expected to provide benefits, in terms of planning speed, as well as in terms of statistical efficiency.
Researcher Affiliation Industry Zheng Wen Deep Mind zhengwen@google.com Doina Precup Deep Mind doinap@google.com Morteza Ibrahimi Deep Mind mibrahimi@google.com Andre Barreto Deep Mind andrebarreto@google.com Benjamin Van Roy Deep Mind benvanroy@google.com Satinder Singh Deep Mind baveja@google.com
Pseudocode Yes Algorithm 1: PSRL with a Planner, Sampler, and Inferer; Algorithm 2: Planning with Exit Profiles (PEP)
Open Source Code No The paper is a theoretical investigation and does not mention providing open-source code for the described methodology. It refers to 'Behaviour suite for reinforcement learning' by other authors but does not state that its own code is available.
Open Datasets No This is a theoretical paper and does not describe experiments using datasets. Therefore, it does not provide information about publicly available datasets or access to them.
Dataset Splits No This is a theoretical paper and does not describe experiments. Thus, it does not provide information on training/test/validation dataset splits.
Hardware Specification No This is a theoretical paper and does not describe experiments. Therefore, it does not provide hardware specifications.
Software Dependencies No This is a theoretical paper and does not describe experiments. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper and does not describe empirical experiments. Therefore, it does not provide details about an experimental setup, hyperparameters, or training settings.