Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Authors: Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (ADAPTON), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. [...] We show that ADAPTON achieves a regret upper bound of polylog (T), after T time steps of agent-environment interaction. To the best of our knowledge, ADAPTON is the first algorithm that achieves polylog (T) regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.
Researcher Affiliation Academia Sahin Lale Caltech alale@caltech.edu Kamyar Azizzadenesheli Purdue University kamyar@purdue.edu Babak Hassibi Caltech hassibi@caltech.edu Anima Anandkumar Caltech anima@caltech.edu
Pseudocode Yes ADAPTON is illustrated in Figure 1 and the detailed pseudo-code is provided in Appendix C.
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret analysis in partially observable linear dynamical systems, rather than empirical evaluation on a specific dataset. Therefore, it does not mention a publicly available dataset for training.
Dataset Splits No The paper is theoretical and focuses on algorithm design and regret analysis, not empirical validation on a dataset with splits.
Hardware Specification No The paper does not provide any specific hardware details used for running experiments. It focuses on theoretical analysis and algorithm development.
Software Dependencies No The paper does not specify any software dependencies with version numbers, focusing on theoretical contributions.
Experiment Setup No The paper describes an algorithm and its theoretical properties but does not provide specific experimental setup details like hyperparameter values or training configurations, as it is a theoretical work.