On Efficiency in Hierarchical Reinforcement Learning
Authors: Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the benefits of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efficient HRL methods. Specifically, we formalize the intuition that HRL can exploit well repeating "sub MDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efficient, as established through a finite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efficient. In this paper, we present two general results which highlight the types of problems in which HRL is expected to provide benefits, in terms of planning speed, as well as in terms of statistical efficiency. |
| Researcher Affiliation | Industry | Zheng Wen Deep Mind zhengwen@google.com Doina Precup Deep Mind doinap@google.com Morteza Ibrahimi Deep Mind mibrahimi@google.com Andre Barreto Deep Mind andrebarreto@google.com Benjamin Van Roy Deep Mind benvanroy@google.com Satinder Singh Deep Mind baveja@google.com |
| Pseudocode | Yes | Algorithm 1: PSRL with a Planner, Sampler, and Inferer; Algorithm 2: Planning with Exit Profiles (PEP) |
| Open Source Code | No | The paper is a theoretical investigation and does not mention providing open-source code for the described methodology. It refers to 'Behaviour suite for reinforcement learning' by other authors but does not state that its own code is available. |
| Open Datasets | No | This is a theoretical paper and does not describe experiments using datasets. Therefore, it does not provide information about publicly available datasets or access to them. |
| Dataset Splits | No | This is a theoretical paper and does not describe experiments. Thus, it does not provide information on training/test/validation dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not describe experiments. Therefore, it does not provide hardware specifications. |
| Software Dependencies | No | This is a theoretical paper and does not describe experiments. Therefore, it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe empirical experiments. Therefore, it does not provide details about an experimental setup, hyperparameters, or training settings. |