Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
Authors: Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (ADAPTON), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. [...] We show that ADAPTON achieves a regret upper bound of polylog (T), after T time steps of agent-environment interaction. To the best of our knowledge, ADAPTON is the first algorithm that achieves polylog (T) regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control. |
| Researcher Affiliation | Academia | Sahin Lale Caltech alale@caltech.edu Kamyar Azizzadenesheli Purdue University kamyar@purdue.edu Babak Hassibi Caltech hassibi@caltech.edu Anima Anandkumar Caltech anima@caltech.edu |
| Pseudocode | Yes | ADAPTON is illustrated in Figure 1 and the detailed pseudo-code is provided in Appendix C. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret analysis in partially observable linear dynamical systems, rather than empirical evaluation on a specific dataset. Therefore, it does not mention a publicly available dataset for training. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and regret analysis, not empirical validation on a dataset with splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running experiments. It focuses on theoretical analysis and algorithm development. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, focusing on theoretical contributions. |
| Experiment Setup | No | The paper describes an algorithm and its theoretical properties but does not provide specific experimental setup details like hyperparameter values or training configurations, as it is a theoretical work. |