reproducibilityindex.ai

Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT$ Regret

Authors: Alon Cohen, Tomer Koren, Yishay Mansour

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present the ﬁrst computationally-efﬁcient algorithm with e O(T) regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesv ari (2011) and Dean, Mania, Matni, Recht, and Tu (2018). In this paper, we give the ﬁrst computationally-efﬁcient algorithm that attains e O(T) regret for learning LQ systems, thus resolving the open problem of Abbasi-Yadkori and Szepesv ari (2011) and Dean et al. (2018). The key to the efﬁciency of our algorithm is in reformulating the LQ control problem as a convex semi-deﬁnite program. Our algorithm solves a sequence of semi-deﬁnite relaxations of the inﬁnite horizon LQ problem, the solutions of which are used to compute optimistic policies for the underlying unknown LQ system. (Section 4) We now formally state our main result: a high-probability e O(T) regret bound for the efﬁcient algorithm given in Algorithm 1. Theorem 4. Suppose that Algorithm 1 is initialized so that the initial estimation error (A0 B0) (A B ) 2 F ε satisﬁes 4λ = α5 0σ10 Assume T poly(n, ν, ϑ, α 1 0 , σ 1, x1 ). Then for any δ (0, 1), with probability at least 1 δ the regret of Algorithm 1 satisﬁes RT = O ν5n3ϑ Furthermore, the run-time per round of the procedure is polynomial in these factors.
Researcher Affiliation	Collaboration	1Google Research, Tel-Aviv 2Technion Israel Inst. of Technology 3Google Brain, Mountain View 4Tel-Aviv University. Correspondence to: Alon Cohen <alon.cohen@technion.ac.il>.
Pseudocode	Yes	In this section we describe our efﬁcient online algorithm for learning in LQRs; see pseudo-code in Algorithm 1. ... Algorithm 1 OSLO: Optimistic Sdp for Lq c Ontrol ... Algorithm 2 Warm-up
Open Source Code	No	The paper does not provide any statement about making its source code available, nor does it include a link to a code repository.
Open Datasets	No	The paper is a theoretical work focusing on algorithm design and analysis of regret bounds. It does not conduct empirical experiments using datasets, thus no dataset is mentioned as publicly available for training.
Dataset Splits	No	The paper does not describe empirical experiments with datasets, and therefore, no specific dataset split information for validation is provided.
Hardware Specification	No	The paper discusses theoretical runtime complexity ('run-time per round of the procedure is polynomial in these factors') but does not specify any particular hardware (GPU, CPU models, etc.) used for computations or experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers that would be needed to replicate the theoretical analysis or algorithm design.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments. Therefore, it does not provide details on experimental setup such as hyperparameters or system-level training settings.